***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 2, tensor-model-parallel size: 4, pipeline-model-parallel size: 16 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 2 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1518981.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 16 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 200 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 1024 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] > setting tensorboard ...  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+2d51f617, 2d51f617, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 16 > setting random seeds to 43 ... [2022-01-26 18:41:57,716] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.148 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_upper_triang_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -o scaled_upper_triang_masked_softmax_cuda.cuda.o [2/2] c++ scaled_upper_triang_masked_softmax.o scaled_upper_triang_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_upper_triang_masked_softmax_cuda.so Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -o scaled_masked_softmax_cuda.cuda.o /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced [2/2] c++ scaled_masked_softmax.o scaled_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_masked_softmax_cuda.so Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output layer_norm_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -maxrregcount=50 -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -o layer_norm_cuda_kernel.cuda.o [2/2] c++ layer_norm_cuda.o layer_norm_cuda_kernel.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o fused_mix_prec_layer_norm_cuda.so Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 76.543 seconds time to initialize megatron (seconds): 50.409 [after megatron is initialized] datetime: 2022-01-26 18:43:14 building GPT model ... [2022-01-26 18:43:14,410] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,411] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,412] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:43:14,443] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-01-26 18:43:14,444] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-01-26 18:43:14,444] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.32 GB, percent = 7.8% [2022-01-26 18:43:14,444] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=1, data=0, model=0): 8, ProcessCoord(pipe=1, data=0, model=1): 9, ProcessCoord(pipe=1, data=0, model=2): 10, ProcessCoord(pipe=1, data=0, model=3): 11, ProcessCoord(pipe=1, data=1, model=0): 12, ProcessCoord(pipe=1, data=1, model=1): 13, ProcessCoord(pipe=1, data=1, model=2): 14, ProcessCoord(pipe=1, data=1, model=3): 15, ProcessCoord(pipe=2, data=0, model=0): 16, ProcessCoord(pipe=2, data=0, model=1): 17, ProcessCoord(pipe=2, data=0, model=2): 18, ProcessCoord(pipe=2, data=0, model=3): 19, ProcessCoord(pipe=2, data=1, model=0): 20, ProcessCoord(pipe=2, data=1, model=1): 21, ProcessCoord(pipe=2, data=1, model=2): 22, ProcessCoord(pipe=2, data=1, model=3): 23, ProcessCoord(pipe=3, data=0, model=0): 24, ProcessCoord(pipe=3, data=0, model=1): 25, ProcessCoord(pipe=3, data=0, model=2): 26, ProcessCoord(pipe=3, data=0, model=3): 27, ProcessCoord(pipe=3, data=1, model=0): 28, ProcessCoord(pipe=3, data=1, model=1): 29, ProcessCoord(pipe=3, data=1, model=2): 30, ProcessCoord(pipe=3, data=1, model=3): 31, ProcessCoord(pipe=4, data=0, model=0): 32, ProcessCoord(pipe=4, data=0, model=1): 33, ProcessCoord(pipe=4, data=0, model=2): 34, ProcessCoord(pipe=4, data=0, model=3): 35, ProcessCoord(pipe=4, data=1, model=0): 36, ProcessCoord(pipe=4, data=1, model=1): 37, ProcessCoord(pipe=4, data=1, model=2): 38, ProcessCoord(pipe=4, data=1, model=3): 39, ProcessCoord(pipe=5, data=0, model=0): 40, ProcessCoord(pipe=5, data=0, model=1): 41, ProcessCoord(pipe=5, data=0, model=2): 42, ProcessCoord(pipe=5, data=0, model=3): 43, ProcessCoord(pipe=5, data=1, model=0): 44, ProcessCoord(pipe=5, data=1, model=1): 45, ProcessCoord(pipe=5, data=1, model=2): 46, ProcessCoord(pipe=5, data=1, model=3): 47, ProcessCoord(pipe=6, data=0, model=0): 48, ProcessCoord(pipe=6, data=0, model=1): 49, ProcessCoord(pipe=6, data=0, model=2): 50, ProcessCoord(pipe=6, data=0, model=3): 51, ProcessCoord(pipe=6, data=1, model=0): 52, ProcessCoord(pipe=6, data=1, model=1): 53, ProcessCoord(pipe=6, data=1, model=2): 54, ProcessCoord(pipe=6, data=1, model=3): 55, ProcessCoord(pipe=7, data=0, model=0): 56, ProcessCoord(pipe=7, data=0, model=1): 57, ProcessCoord(pipe=7, data=0, model=2): 58, ProcessCoord(pipe=7, data=0, model=3): 59, ProcessCoord(pipe=7, data=1, model=0): 60, ProcessCoord(pipe=7, data=1, model=1): 61, ProcessCoord(pipe=7, data=1, model=2): 62, ProcessCoord(pipe=7, data=1, model=3): 63, ProcessCoord(pipe=8, data=0, model=0): 64, ProcessCoord(pipe=8, data=0, model=1): 65, ProcessCoord(pipe=8, data=0, model=2): 66, ProcessCoord(pipe=8, data=0, model=3): 67, ProcessCoord(pipe=8, data=1, model=0): 68, ProcessCoord(pipe=8, data=1, model=1): 69, ProcessCoord(pipe=8, data=1, model=2): 70, ProcessCoord(pipe=8, data=1, model=3): 71, ProcessCoord(pipe=9, data=0, model=0): 72, ProcessCoord(pipe=9, data=0, model=1): 73, ProcessCoord(pipe=9, data=0, model=2): 74, ProcessCoord(pipe=9, data=0, model=3): 75, ProcessCoord(pipe=9, data=1, model=0): 76, ProcessCoord(pipe=9, data=1, model=1): 77, ProcessCoord(pipe=9, data=1, model=2): 78, ProcessCoord(pipe=9, data=1, model=3): 79, ProcessCoord(pipe=10, data=0, model=0): 80, ProcessCoord(pipe=10, data=0, model=1): 81, ProcessCoord(pipe=10, data=0, model=2): 82, ProcessCoord(pipe=10, data=0, model=3): 83, ProcessCoord(pipe=10, data=1, model=0): 84, ProcessCoord(pipe=10, data=1, model=1): 85, ProcessCoord(pipe=10, data=1, model=2): 86, ProcessCoord(pipe=10, data=1, model=3): 87, ProcessCoord(pipe=11, data=0, model=0): 88, ProcessCoord(pipe=11, data=0, model=1): 89, ProcessCoord(pipe=11, data=0, model=2): 90, ProcessCoord(pipe=11, data=0, model=3): 91, ProcessCoord(pipe=11, data=1, model=0): 92, ProcessCoord(pipe=11, data=1, model=1): 93, ProcessCoord(pipe=11, data=1, model=2): 94, ProcessCoord(pipe=11, data=1, model=3): 95, ProcessCoord(pipe=12, data=0, model=0): 96, ProcessCoord(pipe=12, data=0, model=1): 97, ProcessCoord(pipe=12, data=0, model=2): 98, ProcessCoord(pipe=12, data=0, model=3): 99, ProcessCoord(pipe=12, data=1, model=0): 100, ProcessCoord(pipe=12, data=1, model=1): 101, ProcessCoord(pipe=12, data=1, model=2): 102, ProcessCoord(pipe=12, data=1, model=3): 103, ProcessCoord(pipe=13, data=0, model=0): 104, ProcessCoord(pipe=13, data=0, model=1): 105, ProcessCoord(pipe=13, data=0, model=2): 106, ProcessCoord(pipe=13, data=0, model=3): 107, ProcessCoord(pipe=13, data=1, model=0): 108, ProcessCoord(pipe=13, data=1, model=1): 109, ProcessCoord(pipe=13, data=1, model=2): 110, ProcessCoord(pipe=13, data=1, model=3): 111, ProcessCoord(pipe=14, data=0, model=0): 112, ProcessCoord(pipe=14, data=0, model=1): 113, ProcessCoord(pipe=14, data=0, model=2): 114, ProcessCoord(pipe=14, data=0, model=3): 115, ProcessCoord(pipe=14, data=1, model=0): 116, ProcessCoord(pipe=14, data=1, model=1): 117, ProcessCoord(pipe=14, data=1, model=2): 118, ProcessCoord(pipe=14, data=1, model=3): 119, ProcessCoord(pipe=15, data=0, model=0): 120, ProcessCoord(pipe=15, data=0, model=1): 121, ProcessCoord(pipe=15, data=0, model=2): 122, ProcessCoord(pipe=15, data=0, model=3): 123, ProcessCoord(pipe=15, data=1, model=0): 124, ProcessCoord(pipe=15, data=1, model=1): 125, ProcessCoord(pipe=15, data=1, model=2): 126, ProcessCoord(pipe=15, data=1, model=3): 127} [2022-01-26 18:43:15,526] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=4 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=8 layers=4 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=9 layers=4 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=10 layers=4 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=11 layers=4 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=12 layers=4 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=13 layers=4 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=14 layers=4 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=15 layers=8 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-01-26 18:43:16,313] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-01-26 18:43:16,314] [INFO] [utils.py:825:see_memory_usage] MA 3.38 GB Max_MA 3.38 GB CA 3.43 GB Max_CA 3 GB [2022-01-26 18:43:16,314] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.65 GB, percent = 7.9% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-01-26 18:43:16,403] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+2d51f617, git-hash=2d51f617, git-branch=master [2022-01-26 18:43:17,230] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-01-26 18:43:17,230] [INFO] [engine.py:1093:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-01-26 18:43:17,230] [INFO] [engine.py:1099:_configure_optimizer] Using client Optimizer as basic optimizer [2022-01-26 18:43:17,231] [INFO] [engine.py:1115:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-01-26 18:43:17,231] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-01-26 18:43:17,231] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-01-26 18:43:17,231] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-01-26 18:43:17,231] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-01-26 18:43:17,231] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-01-26 18:43:17,231] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 69 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 124 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 125 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 121 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 126 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 122 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 123 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 120 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 127 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 5 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 4 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 31 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 0 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 12 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 3 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 2 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 6 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 7 partition count [2, 2] and sizes[(892741800, False), (185600, False)] [2022-01-26 18:43:26,293] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-01-26 18:43:26,294] [INFO] [utils.py:825:see_memory_usage] MA 6.66 GB Max_MA 8.32 GB CA 11.79 GB Max_CA 12 GB [2022-01-26 18:43:26,294] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.73 GB, percent = 7.9% [2022-01-26 18:43:26,368] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-01-26 18:43:26,368] [INFO] [utils.py:825:see_memory_usage] MA 13.31 GB Max_MA 16.64 GB CA 21.77 GB Max_CA 22 GB [2022-01-26 18:43:26,368] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.73 GB, percent = 7.9% [2022-01-26 18:43:26,368] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-01-26 18:43:26,387] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-01-26 18:43:26,387] [INFO] [utils.py:825:see_memory_usage] MA 13.31 GB Max_MA 13.31 GB CA 21.77 GB Max_CA 22 GB [2022-01-26 18:43:26,388] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.76 GB, percent = 7.9% [2022-01-26 18:43:26,388] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-01-26 18:43:26,388] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-01-26 18:43:26,388] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-01-26 18:43:26,388] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-01-26 18:43:26,388] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] amp_params ................... False [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] dump_state ................... False [2022-01-26 18:43:26,388] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 1024 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] pld_params ................... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] world_size ................... 2 [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-01-26 18:43:26,389] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-01-26 18:43:26,390] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-01-26 18:43:26,390] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=1024 micro_batch_size=1 [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=65 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=64 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=67 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=66 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=96 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=98 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=99 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=35 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=34 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,512] [INFO] [engine.py:151:__init__] RANK=32 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=81 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=83 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=33 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=80 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=82 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=97 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=17 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=16 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=19 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=113 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=112 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=114 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=91 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=89 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=75 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=18 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=115 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=74 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=59 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=11 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=10 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=121 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=123 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=120 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=105 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=50 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=51 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=49 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=8 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=9 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=26 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=88 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=90 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=42 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=41 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=24 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=25 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=72 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=73 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=40 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=48 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=106 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=107 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=104 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=27 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=43 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=56 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=57 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=58 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:27,513] [INFO] [engine.py:151:__init__] RANK=122 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:43:33,820] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:33,946] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:34,240] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:34,351] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:34,537] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:34,666] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:34,809] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:34,932] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,015] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,064] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,146] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,294] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,356] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,538] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,676] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,716] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:35,718] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Killing subprocess 213605 Killing subprocess 213606 Killing subprocess 213607 Killing subprocess 213608 Killing subprocess 213609 Killing subprocess 213611 Killing subprocess 213612 Killing subprocess 213614 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. Killing subprocess 199706 Killing subprocess 199707 Killing subprocess 199708 Killing subprocess 199709 Killing subprocess 199710 Killing subprocess 199712 Killing subprocess 199713 Killing subprocess 199715 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main Killing subprocess 213315 Killing subprocess 213316 Killing subprocess 213317 Killing subprocess 213318 Killing subprocess 213319 Killing subprocess 213321 Killing subprocess 213323 Killing subprocess 213325 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. Killing subprocess 201303 Killing subprocess 201305 Killing subprocess 201306 Killing subprocess 201307 Killing subprocess 201308 Killing subprocess 201310 Killing subprocess 201312 Killing subprocess 201314 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. Killing subprocess 214112 Killing subprocess 214113 Killing subprocess 214114 Killing subprocess 214115 Killing subprocess 214117 Killing subprocess 214119 Killing subprocess 214121 Killing subprocess 214124 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 18:43:36,155] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Killing subprocess 198944 Killing subprocess 198945 Killing subprocess 198946 Killing subprocess 198947 Killing subprocess 198948 Killing subprocess 198950 Killing subprocess 198952 Killing subprocess 198955 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 18:43:36,312] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,450] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint [2022-01-26 18:43:36,464] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,606] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,784] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,790] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,844] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,863] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,903] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint [2022-01-26 18:43:36,917] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,961] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,970] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:36,980] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:37,023] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:37,178] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:37,302] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. [2022-01-26 18:43:37,541] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Killing subprocess 258016 Killing subprocess 258017 Killing subprocess 258018 Killing subprocess 258019 Killing subprocess 258020 Killing subprocess 258021 Killing subprocess 258022 Killing subprocess 258028 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 18:43:37,704] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Killing subprocess 254257 Killing subprocess 254258 Killing subprocess 254259 Killing subprocess 254260 Killing subprocess 254261 Killing subprocess 254262 Killing subprocess 254265 Killing subprocess 254267 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main Killing subprocess 199922 Killing subprocess 199923 Killing subprocess 199924 Killing subprocess 199925 Killing subprocess 199926 Killing subprocess 199927 Killing subprocess 199930 Killing subprocess 199932 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. Killing subprocess 213508 Killing subprocess 213509 Killing subprocess 213510 Killing subprocess 213511 Killing subprocess 213513 Killing subprocess 213515 Killing subprocess 213517 Killing subprocess 213518 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main Killing subprocess 217567 Killing subprocess 217568 Killing subprocess 217569 Killing subprocess 217570 Killing subprocess 217572 Killing subprocess 217574 Killing subprocess 217577 Killing subprocess 217579 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main Killing subprocess 214612 Killing subprocess 214613 Killing subprocess 214614 Killing subprocess 214615 Killing subprocess 214617 Killing subprocess 214619 Killing subprocess 214621 Killing subprocess 214623 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 18:43:38,015] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Killing subprocess 199892 Killing subprocess 199893 Killing subprocess 199894 Killing subprocess 199895 Killing subprocess 199896 Killing subprocess 199897 Killing subprocess 199898 Killing subprocess 199902 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 18:43:38,259] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2598, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 1 but the current world size is 2. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Killing subprocess 214112 Killing subprocess 214113 Killing subprocess 214114 Killing subprocess 214115 Killing subprocess 214116 Killing subprocess 214118 Killing subprocess 214121 Killing subprocess 214123 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. Killing subprocess 214409 Killing subprocess 214410 Killing subprocess 214411 Killing subprocess 214412 Killing subprocess 214414 Killing subprocess 214416 Killing subprocess 214418 Killing subprocess 214420 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1518981.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. srun: error: jean-zay-iam37: task 5: Exited with exit code 1 srun: Terminating job step 1518981.0 slurmstepd: error: *** STEP 1518981.0 ON jean-zay-iam31 CANCELLED AT 2022-01-26T18:43:38 *** Killing subprocess 205635 Killing subprocess 205636 Killing subprocess 205637 Killing subprocess 205638 Killing subprocess 205639 Killing subprocess 205640 Killing subprocess 205641 Killing subprocess 205642 Main process received SIGTERM, exiting srun: error: jean-zay-iam50: task 13: Exited with exit code 1 srun: error: jean-zay-iam43: task 6: Exited with exit code 1 srun: error: jean-zay-iam44: task 7: Exited with exit code 1 srun: error: jean-zay-iam51: task 14: Exited with exit code 1 srun: error: jean-zay-iam48: task 11: Exited with exit code 1 srun: error: jean-zay-iam46: task 9: Exited with exit code 1 srun: error: jean-zay-iam32: task 1: Exited with exit code 1 srun: error: jean-zay-iam47: task 10: Exited with exit code 1 srun: error: jean-zay-iam35: task 4: Exited with exit code 1 srun: error: jean-zay-iam45: task 8: Exited with exit code 1 srun: error: jean-zay-iam49: task 12: Terminated srun: error: jean-zay-iam33: task 2: Terminated srun: error: jean-zay-iam31: task 0: Exited with exit code 1 srun: error: jean-zay-iam34: task 3: Exited with exit code 1 srun: error: jean-zay-iam52: task 15: Exited with exit code 1 srun: Force Terminated job step 1518981.0 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** > setting tensorboard ... using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1519069.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 200 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+2d51f617, 2d51f617, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-01-26 18:47:11,367] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.137 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 6.766 seconds time to initialize megatron (seconds): 38.272 [after megatron is initialized] datetime: 2022-01-26 18:47:18 building GPT model ... [2022-01-26 18:47:18,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 18:47:18,323] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-01-26 18:47:18,324] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-01-26 18:47:18,324] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.26 GB, percent = 7.8% [2022-01-26 18:47:18,325] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-01-26 18:47:20,002] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-01-26 18:47:20,703] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-01-26 18:47:20,703] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-01-26 18:47:20,704] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.97 GB, percent = 7.9% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-01-26 18:47:20,791] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+2d51f617, git-hash=2d51f617, git-branch=master Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] [2022-01-26 18:47:21,930] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-01-26 18:47:21,930] [INFO] [engine.py:1093:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-01-26 18:47:21,930] [INFO] [engine.py:1099:_configure_optimizer] Using client Optimizer as basic optimizer [2022-01-26 18:47:21,930] [INFO] [engine.py:1115:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-01-26 18:47:21,930] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-01-26 18:47:21,930] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-01-26 18:47:21,931] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-01-26 18:47:21,931] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-01-26 18:47:21,931] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-01-26 18:47:21,931] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] [2022-01-26 18:47:26,464] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-01-26 18:47:26,465] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-01-26 18:47:26,465] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.49 GB, percent = 7.8% [2022-01-26 18:47:26,530] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-01-26 18:47:26,531] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-01-26 18:47:26,531] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.49 GB, percent = 7.8% [2022-01-26 18:47:26,531] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-01-26 18:47:26,556] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-01-26 18:47:26,557] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-01-26 18:47:26,557] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.49 GB, percent = 7.8% [2022-01-26 18:47:26,557] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-01-26 18:47:26,557] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-01-26 18:47:26,557] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-01-26 18:47:26,557] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-01-26 18:47:26,557] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-01-26 18:47:26,557] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-01-26 18:47:26,557] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-01-26 18:47:26,557] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-01-26 18:47:26,557] [INFO] [config.py:1062:print] amp_params ................... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] dump_state ................... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] pld_params ................... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-01-26 18:47:26,558] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] world_size ................... 1 [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-01-26 18:47:26,559] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-01-26 18:47:26,559] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-01-26 18:47:26,559] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 18:47:28,893] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-01-26 18:47:52,062] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-01-26 18:47:52,141] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-01-26 18:47:52,562] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-01-26 18:47:53,402] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-01-26 18:47:53,911] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-01-26 18:47:54,007] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-01-26 18:47:54,030] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-01-26 18:47:54,139] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-01-26 18:47:54,299] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-01-26 18:47:54,692] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-01-26 18:47:55,016] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-01-26 18:47:55,033] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-01-26 18:47:55,195] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-01-26 18:47:55,428] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-01-26 18:47:55,551] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-01-26 18:47:55,579] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-01-26 18:47:55,630] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-01-26 18:47:55,663] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-01-26 18:47:55,838] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-01-26 18:47:55,988] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-01-26 18:47:56,052] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-01-26 18:47:56,236] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-01-26 18:47:56,362] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-01-26 18:47:56,479] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-01-26 18:47:56,523] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-01-26 18:47:56,639] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-01-26 18:47:56,782] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-01-26 18:47:56,835] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-01-26 18:47:57,152] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-01-26 18:47:57,173] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-01-26 18:47:57,191] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-01-26 18:47:57,193] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-01-26 18:47:57,208] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-01-26 18:47:57,264] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-01-26 18:47:57,265] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-01-26 18:47:57,348] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-01-26 18:47:57,358] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-01-26 18:47:57,428] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-01-26 18:47:57,437] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-01-26 18:47:57,498] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-01-26 18:47:57,516] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-01-26 18:47:57,571] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-01-26 18:47:57,663] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-01-26 18:47:57,676] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-01-26 18:47:57,682] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-01-26 18:47:57,735] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-01-26 18:47:57,951] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-01-26 18:47:58,048] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-01-26 18:47:58,061] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-01-26 18:47:58,188] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-01-26 18:47:58,221] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-01-26 18:47:58,251] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-01-26 18:47:58,253] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-01-26 18:47:58,305] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-01-26 18:47:58,488] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-01-26 18:47:58,544] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-01-26 18:47:58,563] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-01-26 18:47:58,601] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-01-26 18:47:58,809] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-01-26 18:47:58,830] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-01-26 18:47:58,857] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-01-26 18:47:58,860] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-01-26 18:47:58,982] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-01-26 18:47:59,063] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-01-26 18:47:59,113] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-01-26 18:47:59,179] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-01-26 18:47:59,187] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-01-26 18:47:59,213] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-01-26 18:47:59,223] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-01-26 18:47:59,374] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-01-26 18:47:59,487] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-01-26 18:47:59,535] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-01-26 18:47:59,574] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-01-26 18:47:59,680] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-01-26 18:47:59,685] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-01-26 18:47:59,745] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-01-26 18:47:59,767] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-01-26 18:47:59,826] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-01-26 18:47:59,985] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-01-26 18:47:59,992] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-01-26 18:48:00,009] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-01-26 18:48:00,057] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-01-26 18:48:00,127] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-01-26 18:48:00,164] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-01-26 18:48:00,333] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-01-26 18:48:00,400] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-01-26 18:48:00,424] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-01-26 18:48:00,434] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-01-26 18:48:00,521] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-01-26 18:48:00,544] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-01-26 18:48:00,578] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-01-26 18:48:00,636] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-01-26 18:48:00,758] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-01-26 18:48:00,883] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-01-26 18:48:00,923] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-01-26 18:48:01,132] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-01-26 18:48:01,244] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-01-26 18:48:01,272] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-01-26 18:48:01,316] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-01-26 18:48:01,367] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-01-26 18:48:01,457] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-01-26 18:48:01,462] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-01-26 18:48:01,497] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-01-26 18:48:01,542] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-01-26 18:48:01,552] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-01-26 18:48:01,579] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-01-26 18:48:01,584] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-01-26 18:48:01,649] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-01-26 18:48:01,703] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-01-26 18:48:01,706] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-01-26 18:48:01,728] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-01-26 18:48:01,770] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-01-26 18:48:01,849] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-01-26 18:48:01,928] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-01-26 18:48:01,961] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-01-26 18:48:01,971] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-01-26 18:48:02,017] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-01-26 18:48:02,035] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-01-26 18:48:02,088] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-01-26 18:48:02,256] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-01-26 18:48:02,290] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-01-26 18:48:02,327] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-01-26 18:48:02,355] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-01-26 18:48:02,429] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-01-26 18:48:02,434] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-01-26 18:48:02,441] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-01-26 18:48:02,449] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-01-26 18:48:02,527] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-01-26 18:48:02,528] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-01-26 18:48:02,564] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-01-26 18:48:02,678] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-01-26 18:48:02,888] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-01-26 18:48:02,946] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-01-26 18:48:02,960] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-01-26 18:48:02,966] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-01-26 18:48:02,977] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-01-26 18:48:03,055] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-01-26 18:48:03,113] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-01-26 18:48:03,135] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-01-26 18:48:03,176] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-01-26 18:48:03,283] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-01-26 18:48:03,345] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-01-26 18:48:03,373] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-01-26 18:48:03,458] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-01-26 18:48:03,475] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-01-26 18:48:03,479] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-01-26 18:48:03,690] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-01-26 18:48:03,811] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-01-26 18:48:03,823] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-01-26 18:48:03,825] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-01-26 18:48:03,860] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-01-26 18:48:03,913] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-01-26 18:48:03,938] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-01-26 18:48:03,950] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-01-26 18:48:04,017] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-01-26 18:48:04,107] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-01-26 18:48:04,309] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-01-26 18:48:04,386] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-01-26 18:48:04,465] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-01-26 18:48:04,486] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-01-26 18:48:04,595] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-01-26 18:48:04,639] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-01-26 18:48:04,667] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-01-26 18:48:04,696] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-01-26 18:48:04,741] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-01-26 18:48:04,823] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-01-26 18:48:04,824] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-01-26 18:48:05,014] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-01-26 18:48:05,033] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-01-26 18:48:05,036] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-01-26 18:48:05,125] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-01-26 18:48:05,268] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-01-26 18:48:05,273] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-01-26 18:48:05,276] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-01-26 18:48:05,282] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-01-26 18:48:05,317] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-01-26 18:48:05,322] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-01-26 18:48:05,326] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-01-26 18:48:05,358] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-01-26 18:48:05,429] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-01-26 18:48:05,474] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-01-26 18:48:05,508] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-01-26 18:48:05,518] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-01-26 18:48:05,536] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-01-26 18:48:05,818] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-01-26 18:48:06,042] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-01-26 18:48:06,100] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-01-26 18:48:06,130] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-01-26 18:48:06,140] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-01-26 18:48:06,181] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-01-26 18:48:06,186] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-01-26 18:48:06,187] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-01-26 18:48:06,257] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-01-26 18:48:06,349] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-01-26 18:48:06,386] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-01-26 18:48:06,401] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-01-26 18:48:06,426] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-01-26 18:48:06,520] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-01-26 18:48:06,646] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-01-26 18:48:06,683] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-01-26 18:48:06,703] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-01-26 18:48:06,720] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-01-26 18:48:06,744] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-01-26 18:48:06,750] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-01-26 18:48:06,880] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-01-26 18:48:06,882] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-01-26 18:48:06,893] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-01-26 18:48:06,895] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-01-26 18:48:06,997] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-01-26 18:48:07,015] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-01-26 18:48:07,097] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-01-26 18:48:07,177] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 [2022-01-26 18:48:07,450] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-01-26 18:48:07,632] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-01-26 18:48:07,787] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-01-26 18:48:07,798] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-01-26 18:48:07,823] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-01-26 18:48:07,885] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-01-26 18:48:07,991] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-01-26 18:48:08,077] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-01-26 18:48:08,084] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-01-26 18:48:08,225] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-01-26 18:48:08,257] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-01-26 18:48:08,626] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-01-26 18:48:08,663] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-01-26 18:48:08,783] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-01-26 18:48:09,225] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-01-26 18:48:09,260] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-01-26 18:48:09,295] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-01-26 18:48:09,463] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-01-26 18:48:09,465] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-01-26 18:48:09,790] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-01-26 18:48:09,985] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-01-26 18:48:10,011] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-01-26 18:48:10,027] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-01-26 18:48:10,468] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-01-26 18:48:10,668] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-01-26 18:48:10,719] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-01-26 18:48:10,977] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 [2022-01-26 18:48:11,075] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-01-26 18:48:11,378] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-01-26 18:48:11,720] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-01-26 18:48:12,001] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-01-26 18:48:12,502] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-01-26 18:48:12,671] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-01-26 18:48:12,762] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-01-26 18:48:13,097] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-01-26 18:48:13,366] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-01-26 18:48:13,895] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-01-26 18:48:14,130] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-01-26 18:48:14,272] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-01-26 18:48:15,746] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-01-26 18:48:16,038] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-01-26 18:48:17,684] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-01-26 18:48:18,458] [INFO] [engine.py:2679:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-01-26 18:48:19,813] [INFO] [engine.py:2609:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 15000 time (ms) | load-checkpoint: 49631.92 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-01-26 18:48:19 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 6.197326 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.146 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.170 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.072 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-01-26 18:48:34 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 61551.77 | train/valid/test-data-iterators-setup: 13525.15 [003-001] 103.3651B / 103.3651B[001-001] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B[003-030] 103.3651B / 103.3651B [002-008] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B[002-025] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B[003-017] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B[001-005] 103.3651B / 103.3651B [002-005] 103.3651B / 103.3651B [002-001] 103.3651B / 103.3651B[003-000] 125.2243B / 103.3681B [001-003] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B[002-015] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B[002-028] 103.3651B / 103.3651B[001-029] 103.3651B / 103.3651B[002-029] 103.3651B / 103.3651B [003-006] 103.3651B / 103.3651B[003-007] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B[002-009] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B[002-012] 103.3651B / 103.3651B [003-024] 103.3651B / 103.3651B [002-024] 103.3651B / 103.3651B[001-024] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [001-004] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B [003-003] 103.3651B / 103.3651B[003-002] 103.3651B / 103.3651B [001-018] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B[001-026] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [001-015] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [001-011] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B [003-029] 103.3651B / 103.3651B [002-007] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B[002-006] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B[001-009] 103.3651B / 103.3651B [002-013] 103.3651B / 103.3651B [001-022] 103.3651B / 103.3651B [003-023] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B[002-017] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B [001-002] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B [001-010] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B[003-010] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B[003-011] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B[003-020] 103.3651B / 103.3651B[001-021] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-007] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B[003-012] 103.3651B / 103.3651B [001-013] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B[002-022] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B[000-000] 125.2243B / 103.3681B [002-002] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B [003-027] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [002-031] 125.2273B / 103.3710B [000-011] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B[000-021] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-017] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B[000-027] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-008] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-022] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [003-028] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-01-26 18:48:34 [2022-01-26 18:48:34,497] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-01-26 18:48:34,497] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-01-26 18:48:34,497] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-01-26 18:48:34,497] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-01-26 18:48:34,497] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 4] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 0] (after 15001 iterations) memory (MB) | allocated: 13225.39697265625 | max allocated: 20689.01318359375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 124] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20703.6201171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 8] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 3] (after 15001 iterations) memory (MB) | allocated: 13224.93798828125 | max allocated: 20688.55419921875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 11] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 7] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 5] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 1] (after 15001 iterations) memory (MB) | allocated: 13225.34326171875 | max allocated: 20688.95947265625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 20] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 9] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 125] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20702.9326171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 16] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 24] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 15001 iterations) memory (MB) | allocated: 13222.77197265625 | max allocated: 20686.38818359375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 126] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20703.6201171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 6] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 10] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 35] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 103] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 123] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 119] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 iteration 15001/ 292968 | consumed samples: 30722048 | consumed tokens: 14543749120 | elapsed time per iteration (ms): 264168.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.695227E+00 | loss scale: 262144.0 | grad norm: 172370.614 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.008 | TFLOPs: 44.48 | [Rank 127] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20702.9326171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 22] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 13] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.51123046875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 25] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16957.16162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 66] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 15001 iterations) memory (MB) | allocated: 10796.8388671875 | max allocated: 16957.0205078125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 58] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 86] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 94] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 118] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 122] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 49] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16957.16162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 15001 iterations) memory (MB) | allocated: 10796.8388671875 | max allocated: 16957.0205078125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 iteration 15002/ 292968 | consumed samples: 30724096 | consumed tokens: 14545567744 | elapsed time per iteration (ms): 161571.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.734951E+00 | loss scale: 262144.0 | grad norm: 187802.534 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 72.72 | iteration 15003/ 292968 | consumed samples: 30726144 | consumed tokens: 14547386368 | elapsed time per iteration (ms): 159618.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.717034E+00 | loss scale: 262144.0 | grad norm: 198015.959 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 73.61 | iteration 15004/ 292968 | consumed samples: 30728192 | consumed tokens: 14549204992 | elapsed time per iteration (ms): 144748.5 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.705877E+00 | loss scale: 262144.0 | grad norm: 188953.119 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.17 | iteration 15005/ 292968 | consumed samples: 30730240 | consumed tokens: 14551023616 | elapsed time per iteration (ms): 142732.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.702031E+00 | loss scale: 262144.0 | grad norm: 162012.634 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.32 | srun: Job step aborted: Waiting up to 62 seconds for job step to finish. Killing subprocess 214660 Killing subprocess 214184 Killing subprocess 254856 Killing subprocess 213890 Killing subprocess 214185 Killing subprocess 214661 Killing subprocess 254857 Killing subprocess 214186 Killing subprocess 213891 Killing subprocess 214662 Killing subprocess 258627 Killing subprocess 258628 Killing subprocess 200284 Killing subprocess 258629 Killing subprocess 215175 Killing subprocess 200285 Killing subprocess 214663 Killing subprocess 214665 Killing subprocess 214666 Killing subprocess 214955 Killing subprocess 214187 slurmstepd: error: *** STEP 1519069.0 ON jean-zay-iam31 CANCELLED AT 2022-01-26T19:04:17 *** Killing subprocess 218114 Killing subprocess 199520 Killing subprocess 214188 Killing subprocess 214189 Killing subprocess 200286 Killing subprocess 214191 Killing subprocess 215176 Killing subprocess 254858 Killing subprocess 214667 Killing subprocess 213892 Killing subprocess 254859 Killing subprocess 213893 Killing subprocess 254860 Killing subprocess 213894 Killing subprocess 254861 Killing subprocess 213896 Killing subprocess 254864 Killing subprocess 213898 Killing subprocess 254866 Main process received SIGTERM, exiting Killing subprocess 214956 Killing subprocess 218115 Killing subprocess 199521 Killing subprocess 215177 Killing subprocess 218116 Killing subprocess 214957 Killing subprocess 214668 Killing subprocess 200458 Killing subprocess 199522 Killing subprocess 215178 Killing subprocess 214669 Killing subprocess 258630 Killing subprocess 258632 Killing subprocess 258634 Killing subprocess 258635 Killing subprocess 258637 Killing subprocess 200459 Main process received SIGTERM, exiting Killing subprocess 201871 Killing subprocess 200460 Killing subprocess 200427 Killing subprocess 206740 Killing subprocess 201872 Killing subprocess 200428 Killing subprocess 199523 Killing subprocess 199524 Killing subprocess 201873 Killing subprocess 206741 Killing subprocess 199526 Killing subprocess 199529 Killing subprocess 214033 Killing subprocess 200429 Killing subprocess 206742 Killing subprocess 206743 Killing subprocess 200287 Killing subprocess 214034 Killing subprocess 200288 Killing subprocess 206744 Killing subprocess 200289 Killing subprocess 200290 Killing subprocess 214035 Killing subprocess 200291 Main process received SIGTERM, exiting Killing subprocess 206745 Killing subprocess 214193 Main process received SIGTERM, exiting Killing subprocess 206748 Killing subprocess 206750 Main process received SIGTERM, exiting Killing subprocess 214670 Killing subprocess 218117 Killing subprocess 214672 Killing subprocess 218119 Killing subprocess 214674 Killing subprocess 218121 Killing subprocess 214676 Killing subprocess 218122 Killing subprocess 201874 Killing subprocess 214679 Killing subprocess 200461 Killing subprocess 218123 Killing subprocess 201875 Main process received SIGTERM, exiting Killing subprocess 200462 Main process received SIGTERM, exiting Killing subprocess 214036 Killing subprocess 201877 Killing subprocess 200430 Killing subprocess 200464 Killing subprocess 201879 Killing subprocess 200431 Killing subprocess 200465 Killing subprocess 201881 Killing subprocess 200433 Killing subprocess 200467 Killing subprocess 214038 Killing subprocess 200434 Main process received SIGTERM, exiting Killing subprocess 214040 Killing subprocess 200436 Killing subprocess 214042 Killing subprocess 214044 Main process received SIGTERM, exiting Killing subprocess 214958 Killing subprocess 215179 Killing subprocess 214960 Killing subprocess 213900 Killing subprocess 215181 Killing subprocess 214961 Main process received SIGTERM, exiting Killing subprocess 215183 Killing subprocess 214964 Killing subprocess 215186 Killing subprocess 214966 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 199531 Main process received SIGTERM, exiting Killing subprocess 214668 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 214671 Main process received SIGTERM, exiting ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 2, tensor-model-parallel size: 4, pipeline-model-parallel size: 16 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 2 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1519422.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 16 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 200 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 1024 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+24fe7002, 24fe7002, elastic-ckpt-refresh deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > setting tensorboard ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 16 > setting random seeds to 43 ... [2022-01-26 19:04:57,645] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.135 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 8.075 seconds time to initialize megatron (seconds): -46.142 [after megatron is initialized] datetime: 2022-01-26 19:05:05 building GPT model ... [2022-01-26 19:05:05,858] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,858] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,858] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,858] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,858] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,858] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,858] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,859] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,860] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:05:05,893] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-01-26 19:05:05,893] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-01-26 19:05:05,893] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.54 GB, percent = 7.9% [2022-01-26 19:05:05,894] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=1, data=0, model=0): 8, ProcessCoord(pipe=1, data=0, model=1): 9, ProcessCoord(pipe=1, data=0, model=2): 10, ProcessCoord(pipe=1, data=0, model=3): 11, ProcessCoord(pipe=1, data=1, model=0): 12, ProcessCoord(pipe=1, data=1, model=1): 13, ProcessCoord(pipe=1, data=1, model=2): 14, ProcessCoord(pipe=1, data=1, model=3): 15, ProcessCoord(pipe=2, data=0, model=0): 16, ProcessCoord(pipe=2, data=0, model=1): 17, ProcessCoord(pipe=2, data=0, model=2): 18, ProcessCoord(pipe=2, data=0, model=3): 19, ProcessCoord(pipe=2, data=1, model=0): 20, ProcessCoord(pipe=2, data=1, model=1): 21, ProcessCoord(pipe=2, data=1, model=2): 22, ProcessCoord(pipe=2, data=1, model=3): 23, ProcessCoord(pipe=3, data=0, model=0): 24, ProcessCoord(pipe=3, data=0, model=1): 25, ProcessCoord(pipe=3, data=0, model=2): 26, ProcessCoord(pipe=3, data=0, model=3): 27, ProcessCoord(pipe=3, data=1, model=0): 28, ProcessCoord(pipe=3, data=1, model=1): 29, ProcessCoord(pipe=3, data=1, model=2): 30, ProcessCoord(pipe=3, data=1, model=3): 31, ProcessCoord(pipe=4, data=0, model=0): 32, ProcessCoord(pipe=4, data=0, model=1): 33, ProcessCoord(pipe=4, data=0, model=2): 34, ProcessCoord(pipe=4, data=0, model=3): 35, ProcessCoord(pipe=4, data=1, model=0): 36, ProcessCoord(pipe=4, data=1, model=1): 37, ProcessCoord(pipe=4, data=1, model=2): 38, ProcessCoord(pipe=4, data=1, model=3): 39, ProcessCoord(pipe=5, data=0, model=0): 40, ProcessCoord(pipe=5, data=0, model=1): 41, ProcessCoord(pipe=5, data=0, model=2): 42, ProcessCoord(pipe=5, data=0, model=3): 43, ProcessCoord(pipe=5, data=1, model=0): 44, ProcessCoord(pipe=5, data=1, model=1): 45, ProcessCoord(pipe=5, data=1, model=2): 46, ProcessCoord(pipe=5, data=1, model=3): 47, ProcessCoord(pipe=6, data=0, model=0): 48, ProcessCoord(pipe=6, data=0, model=1): 49, ProcessCoord(pipe=6, data=0, model=2): 50, ProcessCoord(pipe=6, data=0, model=3): 51, ProcessCoord(pipe=6, data=1, model=0): 52, ProcessCoord(pipe=6, data=1, model=1): 53, ProcessCoord(pipe=6, data=1, model=2): 54, ProcessCoord(pipe=6, data=1, model=3): 55, ProcessCoord(pipe=7, data=0, model=0): 56, ProcessCoord(pipe=7, data=0, model=1): 57, ProcessCoord(pipe=7, data=0, model=2): 58, ProcessCoord(pipe=7, data=0, model=3): 59, ProcessCoord(pipe=7, data=1, model=0): 60, ProcessCoord(pipe=7, data=1, model=1): 61, ProcessCoord(pipe=7, data=1, model=2): 62, ProcessCoord(pipe=7, data=1, model=3): 63, ProcessCoord(pipe=8, data=0, model=0): 64, ProcessCoord(pipe=8, data=0, model=1): 65, ProcessCoord(pipe=8, data=0, model=2): 66, ProcessCoord(pipe=8, data=0, model=3): 67, ProcessCoord(pipe=8, data=1, model=0): 68, ProcessCoord(pipe=8, data=1, model=1): 69, ProcessCoord(pipe=8, data=1, model=2): 70, ProcessCoord(pipe=8, data=1, model=3): 71, ProcessCoord(pipe=9, data=0, model=0): 72, ProcessCoord(pipe=9, data=0, model=1): 73, ProcessCoord(pipe=9, data=0, model=2): 74, ProcessCoord(pipe=9, data=0, model=3): 75, ProcessCoord(pipe=9, data=1, model=0): 76, ProcessCoord(pipe=9, data=1, model=1): 77, ProcessCoord(pipe=9, data=1, model=2): 78, ProcessCoord(pipe=9, data=1, model=3): 79, ProcessCoord(pipe=10, data=0, model=0): 80, ProcessCoord(pipe=10, data=0, model=1): 81, ProcessCoord(pipe=10, data=0, model=2): 82, ProcessCoord(pipe=10, data=0, model=3): 83, ProcessCoord(pipe=10, data=1, model=0): 84, ProcessCoord(pipe=10, data=1, model=1): 85, ProcessCoord(pipe=10, data=1, model=2): 86, ProcessCoord(pipe=10, data=1, model=3): 87, ProcessCoord(pipe=11, data=0, model=0): 88, ProcessCoord(pipe=11, data=0, model=1): 89, ProcessCoord(pipe=11, data=0, model=2): 90, ProcessCoord(pipe=11, data=0, model=3): 91, ProcessCoord(pipe=11, data=1, model=0): 92, ProcessCoord(pipe=11, data=1, model=1): 93, ProcessCoord(pipe=11, data=1, model=2): 94, ProcessCoord(pipe=11, data=1, model=3): 95, ProcessCoord(pipe=12, data=0, model=0): 96, ProcessCoord(pipe=12, data=0, model=1): 97, ProcessCoord(pipe=12, data=0, model=2): 98, ProcessCoord(pipe=12, data=0, model=3): 99, ProcessCoord(pipe=12, data=1, model=0): 100, ProcessCoord(pipe=12, data=1, model=1): 101, ProcessCoord(pipe=12, data=1, model=2): 102, ProcessCoord(pipe=12, data=1, model=3): 103, ProcessCoord(pipe=13, data=0, model=0): 104, ProcessCoord(pipe=13, data=0, model=1): 105, ProcessCoord(pipe=13, data=0, model=2): 106, ProcessCoord(pipe=13, data=0, model=3): 107, ProcessCoord(pipe=13, data=1, model=0): 108, ProcessCoord(pipe=13, data=1, model=1): 109, ProcessCoord(pipe=13, data=1, model=2): 110, ProcessCoord(pipe=13, data=1, model=3): 111, ProcessCoord(pipe=14, data=0, model=0): 112, ProcessCoord(pipe=14, data=0, model=1): 113, ProcessCoord(pipe=14, data=0, model=2): 114, ProcessCoord(pipe=14, data=0, model=3): 115, ProcessCoord(pipe=14, data=1, model=0): 116, ProcessCoord(pipe=14, data=1, model=1): 117, ProcessCoord(pipe=14, data=1, model=2): 118, ProcessCoord(pipe=14, data=1, model=3): 119, ProcessCoord(pipe=15, data=0, model=0): 120, ProcessCoord(pipe=15, data=0, model=1): 121, ProcessCoord(pipe=15, data=0, model=2): 122, ProcessCoord(pipe=15, data=0, model=3): 123, ProcessCoord(pipe=15, data=1, model=0): 124, ProcessCoord(pipe=15, data=1, model=1): 125, ProcessCoord(pipe=15, data=1, model=2): 126, ProcessCoord(pipe=15, data=1, model=3): 127} [2022-01-26 19:05:06,973] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=4 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=8 layers=4 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=9 layers=4 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=10 layers=4 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=11 layers=4 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=12 layers=4 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=13 layers=4 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=14 layers=4 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=15 layers=8 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-01-26 19:05:07,753] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-01-26 19:05:07,753] [INFO] [utils.py:825:see_memory_usage] MA 3.38 GB Max_MA 3.38 GB CA 3.43 GB Max_CA 3 GB [2022-01-26 19:05:07,754] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.87 GB, percent = 7.9% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-01-26 19:05:07,831] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+24fe7002, git-hash=24fe7002, git-branch=elastic-ckpt-refresh [2022-01-26 19:05:08,654] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-01-26 19:05:08,654] [INFO] [engine.py:1093:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-01-26 19:05:08,654] [INFO] [engine.py:1099:_configure_optimizer] Using client Optimizer as basic optimizer [2022-01-26 19:05:08,655] [INFO] [engine.py:1115:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-01-26 19:05:08,655] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-01-26 19:05:08,655] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-01-26 19:05:08,655] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-01-26 19:05:08,655] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-01-26 19:05:08,655] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-01-26 19:05:08,655] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 64 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 120 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 2 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 4 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 5 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 3 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 38 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 122 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 125 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 126 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 124 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 127 partition count [2, 2] and sizes[(892741800, False), (197200, False)] Rank: 109 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [2, 2] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 0 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 6 partition count [2, 2] and sizes[(892741800, False), (185600, False)] Rank: 7 partition count [2, 2] and sizes[(892741800, False), (185600, False)] [2022-01-26 19:05:18,418] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-01-26 19:05:18,418] [INFO] [utils.py:825:see_memory_usage] MA 6.66 GB Max_MA 8.32 GB CA 11.79 GB Max_CA 12 GB [2022-01-26 19:05:18,418] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.95 GB, percent = 7.9% [2022-01-26 19:05:18,480] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-01-26 19:05:18,481] [INFO] [utils.py:825:see_memory_usage] MA 13.31 GB Max_MA 16.64 GB CA 21.77 GB Max_CA 22 GB [2022-01-26 19:05:18,481] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.97 GB, percent = 7.9% [2022-01-26 19:05:18,481] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-01-26 19:05:18,507] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-01-26 19:05:18,507] [INFO] [utils.py:825:see_memory_usage] MA 13.31 GB Max_MA 13.31 GB CA 21.77 GB Max_CA 22 GB [2022-01-26 19:05:18,507] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.97 GB, percent = 7.9% [2022-01-26 19:05:18,507] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-01-26 19:05:18,507] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-01-26 19:05:18,507] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-01-26 19:05:18,507] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-01-26 19:05:18,507] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] amp_params ................... False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] dump_state ................... False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-01-26 19:05:18,508] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 1024 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] pld_params ................... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] world_size ................... 2 [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-01-26 19:05:18,509] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-01-26 19:05:18,509] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-01-26 19:05:18,509] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=1024 micro_batch_size=1 [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1785854800 (1785.855M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=66 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=64 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=67 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=65 STAGE=8 LAYERS=4 [35, 39) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=97 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=98 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=35 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=33 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=32 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=34 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=96 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=19 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=49 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=99 STAGE=12 LAYERS=4 [51, 55) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=16 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=17 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=81 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=83 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=18 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=50 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=48 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=82 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=114 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=112 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=115 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=113 STAGE=14 LAYERS=4 [59, 63) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=104 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=107 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=51 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=10 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=9 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=11 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=8 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=56 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=80 STAGE=10 LAYERS=4 [43, 47) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=40 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=121 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=122 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=123 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=42 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=27 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=58 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=88 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=90 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=91 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=106 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=43 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=59 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=89 STAGE=11 LAYERS=4 [47, 51) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=73 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=74 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=105 STAGE=13 LAYERS=4 [55, 59) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=41 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=24 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=25 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=120 STAGE=15 LAYERS=8 [63, 71) STAGE_PARAMS=1785878000 (1785.878M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=57 STAGE=7 LAYERS=4 [31, 35) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=72 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=26 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:19,652] [INFO] [engine.py:151:__init__] RANK=75 STAGE=9 LAYERS=4 [39, 43) STAGE_PARAMS=1615079600 (1615.080M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:05:24,839] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:25,441] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:26,296] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:26,381] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range Killing subprocess 202077 Killing subprocess 202078 Killing subprocess 202079 Killing subprocess 202080 Killing subprocess 202081 Killing subprocess 202083 Killing subprocess 202085 Killing subprocess 202086 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1519422.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 19:05:26,703] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:27,216] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range Killing subprocess 201316 Killing subprocess 201317 Killing subprocess 201318 Killing subprocess 201319 Killing subprocess 201320 Killing subprocess 201322 Killing subprocess 201324 Killing subprocess 201327 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1519422.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. Killing subprocess 202255 Killing subprocess 202256 Killing subprocess 202257 Killing subprocess 202258 Killing subprocess 202259 Killing subprocess 202260 Killing subprocess 202262 Killing subprocess 202264 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1519422.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 19:05:27,877] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:28,194] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:28,330] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain Killing subprocess 203696 Killing subprocess 203697 Killing subprocess 203698 Killing subprocess 203699 Killing subprocess 203700 Killing subprocess 203702 Killing subprocess 203704 Killing subprocess 203706 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1519422.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 19:05:28,373] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range Killing subprocess 216487 Killing subprocess 216488 Killing subprocess 216489 Killing subprocess 216490 Killing subprocess 216492 Killing subprocess 216494 Killing subprocess 216496 Killing subprocess 216498 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in [2022-01-26 19:05:28,433] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1519422.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. [2022-01-26 19:05:28,499] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:28,572] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:28,780] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:28,914] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:29,083] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:29,215] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:29,432] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain Killing subprocess 216465 Killing subprocess 216466 Killing subprocess 216467 Killing subprocess 216468 Killing subprocess 216470 Killing subprocess 216471 Killing subprocess 216473 Killing subprocess 216476 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1519422.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:29,475] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:29,482] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range Killing subprocess 215693 Killing subprocess 215694 Killing subprocess 215695 Killing subprocess 215696 Killing subprocess 215697 Killing subprocess 215699 Killing subprocess 215701 Killing subprocess 215704 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=7', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '16', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--init-method-std', '0.006', '--fp16', '--checkpoint-activations', '--embed-layernorm', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '3_750_000', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1185', '--log-interval', '1', '--save-interval', '200', '--eval-interval', '150', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1519422.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. srun: error: jean-zay-iam37: task 5: Exited with exit code 1 srun: Terminating job step 1519422.0 [2022-01-26 19:05:29,656] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:30,037] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range [2022-01-26 19:05:30,039] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 136, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 402, in setup_model_and_optimizer args.iteration = load_checkpoint(model, optimizer, lr_scheduler) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/checkpointing.py", line 275, in load_checkpoint loaded_dir, state_dict = model[0].load_checkpoint(load_dir) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2449, in load_checkpoint success = self._load_zero_checkpoint( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/engine.py", line 2597, in _load_zero_checkpoint self.optimizer.load_state_dict( File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 2154, in load_state_dict current_rank_sd = state_dict_list[dp_rank] IndexError: list index out of range Killing subprocess 256684 slurmstepd: error: *** STEP 1519422.0 ON jean-zay-iam31 CANCELLED AT 2022-01-26T19:05:30 *** Killing subprocess 256685 Killing subprocess 256686 Killing subprocess 256687 Killing subprocess 256688 Killing subprocess 256690 Killing subprocess 256692 Killing subprocess 256695 Killing subprocess 215845 Killing subprocess 215846 Killing subprocess 209653 Main process received SIGTERM, exiting Killing subprocess 215847 Killing subprocess 209654 Killing subprocess 215986 Killing subprocess 260710 Killing subprocess 216756 Killing subprocess 215848 Killing subprocess 216757 Killing subprocess 209655 Killing subprocess 215987 Killing subprocess 215850 Killing subprocess 215852 Killing subprocess 215854 Killing subprocess 260711 Killing subprocess 216758 Killing subprocess 209656 Killing subprocess 215988 Killing subprocess 215855 Main process received SIGTERM, exiting Killing subprocess 216759 Killing subprocess 209657 Killing subprocess 215989 Killing subprocess 260712 Killing subprocess 219976 Killing subprocess 202306 Killing subprocess 216761 Killing subprocess 209658 Killing subprocess 215990 Killing subprocess 260713 Killing subprocess 202307 Killing subprocess 216763 Killing subprocess 209659 Killing subprocess 215991 Killing subprocess 260715 Killing subprocess 260717 Killing subprocess 219977 Killing subprocess 202308 Killing subprocess 216765 Killing subprocess 209660 Killing subprocess 215993 Killing subprocess 260719 Killing subprocess 219978 Killing subprocess 202309 Killing subprocess 216766 Main process received SIGTERM, exiting Killing subprocess 215995 Killing subprocess 260722 Killing subprocess 219979 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 202310 Killing subprocess 202312 Killing subprocess 219981 Killing subprocess 219983 Killing subprocess 202314 Killing subprocess 202316 Killing subprocess 219986 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 219988 Main process received SIGTERM, exiting Killing subprocess 216976 Killing subprocess 216977 Killing subprocess 216978 Killing subprocess 216979 Killing subprocess 216981 Killing subprocess 216982 Killing subprocess 216984 Killing subprocess 216986 Main process received SIGTERM, exiting srun: error: jean-zay-iam45: task 8: Exited with exit code 1 srun: error: jean-zay-iam33: task 2: Exited with exit code 1 srun: error: jean-zay-iam51: task 14: Exited with exit code 1 srun: error: jean-zay-iam46: task 9: Exited with exit code 1 srun: error: jean-zay-iam44: task 7: Exited with exit code 1 srun: error: jean-zay-iam52: task 15: Exited with exit code 1 srun: error: jean-zay-iam43: task 6: Exited with exit code 1 srun: error: jean-zay-iam34: task 3: Exited with exit code 1 srun: error: jean-zay-iam48: task 11: Exited with exit code 1 srun: error: jean-zay-iam50: task 13: Exited with exit code 1 srun: error: jean-zay-iam31: task 0: Exited with exit code 1 srun: error: jean-zay-iam35: task 4: Exited with exit code 1 srun: error: jean-zay-iam32: task 1: Exited with exit code 1 srun: error: jean-zay-iam49: task 12: Exited with exit code 1 srun: error: jean-zay-iam47: task 10: Exited with exit code 1 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** > setting tensorboard ... using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1519520.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+24fe7002, 24fe7002, elastic-ckpt-refresh deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-01-26 19:07:53,520] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.150 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 6.999 seconds time to initialize megatron (seconds): 0.672 [after megatron is initialized] datetime: 2022-01-26 19:08:00 building GPT model ... [2022-01-26 19:08:00,673] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,673] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,673] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,673] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,673] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,673] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,673] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,674] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,675] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-26 19:08:00,704] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-01-26 19:08:00,704] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-01-26 19:08:00,704] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.42 GB, percent = 7.8% [2022-01-26 19:08:00,705] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-01-26 19:08:02,384] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-01-26 19:08:03,078] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-01-26 19:08:03,079] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-01-26 19:08:03,079] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.86 GB, percent = 7.9% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-01-26 19:08:03,131] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+24fe7002, git-hash=24fe7002, git-branch=elastic-ckpt-refresh Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] [2022-01-26 19:08:04,268] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-01-26 19:08:04,268] [INFO] [engine.py:1093:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-01-26 19:08:04,268] [INFO] [engine.py:1099:_configure_optimizer] Using client Optimizer as basic optimizer [2022-01-26 19:08:04,268] [INFO] [engine.py:1115:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-01-26 19:08:04,268] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-01-26 19:08:04,268] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-01-26 19:08:04,268] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-01-26 19:08:04,269] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-01-26 19:08:04,269] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-01-26 19:08:04,269] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] [2022-01-26 19:08:08,255] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-01-26 19:08:08,255] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-01-26 19:08:08,255] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.65 GB, percent = 7.9% [2022-01-26 19:08:08,331] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-01-26 19:08:08,331] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-01-26 19:08:08,331] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.65 GB, percent = 7.9% [2022-01-26 19:08:08,331] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-01-26 19:08:08,351] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-01-26 19:08:08,351] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-01-26 19:08:08,351] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.65 GB, percent = 7.9% [2022-01-26 19:08:08,351] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-01-26 19:08:08,352] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-01-26 19:08:08,352] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-01-26 19:08:08,352] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-01-26 19:08:08,352] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] amp_params ................... False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] dump_state ................... False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-01-26 19:08:08,352] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] pld_params ................... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] world_size ................... 1 [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-01-26 19:08:08,353] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-01-26 19:08:08,353] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-01-26 19:08:08,354] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-26 19:08:10,670] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-01-26 19:08:35,058] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-01-26 19:08:36,016] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-01-26 19:08:36,401] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-01-26 19:08:36,445] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-01-26 19:08:36,648] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-01-26 19:08:36,714] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-01-26 19:08:37,442] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-01-26 19:08:37,542] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-01-26 19:08:37,806] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-01-26 19:08:37,817] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-01-26 19:08:37,937] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-01-26 19:08:37,959] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-01-26 19:08:38,114] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-01-26 19:08:38,116] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-01-26 19:08:38,135] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-01-26 19:08:38,411] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-01-26 19:08:38,631] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-01-26 19:08:38,690] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-01-26 19:08:38,930] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-01-26 19:08:38,940] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-01-26 19:08:38,981] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-01-26 19:08:39,249] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-01-26 19:08:39,335] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-01-26 19:08:39,369] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-01-26 19:08:39,523] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-01-26 19:08:39,593] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-01-26 19:08:39,723] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-01-26 19:08:39,756] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-01-26 19:08:39,885] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-01-26 19:08:39,939] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-01-26 19:08:40,013] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-01-26 19:08:40,052] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-01-26 19:08:40,060] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-01-26 19:08:40,111] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-01-26 19:08:40,186] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-01-26 19:08:40,327] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-01-26 19:08:40,384] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-01-26 19:08:40,518] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-01-26 19:08:40,659] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-01-26 19:08:40,707] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-01-26 19:08:40,760] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-01-26 19:08:40,989] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-01-26 19:08:41,010] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-01-26 19:08:41,097] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-01-26 19:08:41,128] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-01-26 19:08:41,171] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-01-26 19:08:41,214] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-01-26 19:08:41,257] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-01-26 19:08:41,312] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-01-26 19:08:41,409] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-01-26 19:08:41,422] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-01-26 19:08:41,485] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-01-26 19:08:41,703] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-01-26 19:08:41,725] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-01-26 19:08:41,727] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-01-26 19:08:41,742] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-01-26 19:08:41,790] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-01-26 19:08:41,812] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-01-26 19:08:41,871] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-01-26 19:08:41,895] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-01-26 19:08:42,023] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-01-26 19:08:42,163] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-01-26 19:08:42,223] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-01-26 19:08:42,450] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-01-26 19:08:42,478] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-01-26 19:08:42,656] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-01-26 19:08:42,731] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-01-26 19:08:42,810] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-01-26 19:08:42,931] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-01-26 19:08:43,000] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-01-26 19:08:43,001] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-01-26 19:08:43,066] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-01-26 19:08:43,068] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-01-26 19:08:43,096] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-01-26 19:08:43,161] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-01-26 19:08:43,177] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-01-26 19:08:43,183] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-01-26 19:08:43,230] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-01-26 19:08:43,259] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-01-26 19:08:43,401] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-01-26 19:08:43,408] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-01-26 19:08:43,551] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-01-26 19:08:43,636] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-01-26 19:08:43,647] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-01-26 19:08:43,683] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-01-26 19:08:43,690] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-01-26 19:08:43,696] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-01-26 19:08:43,869] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-01-26 19:08:43,889] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-01-26 19:08:43,926] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-01-26 19:08:44,041] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-01-26 19:08:44,133] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-01-26 19:08:44,211] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-01-26 19:08:44,231] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-01-26 19:08:44,233] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-01-26 19:08:44,295] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-01-26 19:08:44,348] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-01-26 19:08:44,384] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-01-26 19:08:44,389] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-01-26 19:08:44,415] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-01-26 19:08:44,418] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-01-26 19:08:44,513] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-01-26 19:08:44,525] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-01-26 19:08:44,580] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-01-26 19:08:44,725] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-01-26 19:08:44,732] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-01-26 19:08:44,799] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-01-26 19:08:44,800] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-01-26 19:08:44,801] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-01-26 19:08:44,826] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-01-26 19:08:44,934] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-01-26 19:08:44,934] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-01-26 19:08:44,971] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-01-26 19:08:45,069] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-01-26 19:08:45,139] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-01-26 19:08:45,238] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-01-26 19:08:45,253] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-01-26 19:08:45,276] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-01-26 19:08:45,281] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-01-26 19:08:45,303] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-01-26 19:08:45,303] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-01-26 19:08:45,444] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-01-26 19:08:45,457] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-01-26 19:08:45,534] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-01-26 19:08:45,555] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-01-26 19:08:45,557] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-01-26 19:08:45,600] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-01-26 19:08:45,617] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-01-26 19:08:45,691] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-01-26 19:08:45,786] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-01-26 19:08:45,796] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-01-26 19:08:45,848] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-01-26 19:08:45,868] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-01-26 19:08:45,881] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-01-26 19:08:45,902] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-01-26 19:08:45,989] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-01-26 19:08:46,007] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-01-26 19:08:46,178] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-01-26 19:08:46,240] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-01-26 19:08:46,251] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-01-26 19:08:46,298] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-01-26 19:08:46,370] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-01-26 19:08:46,389] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-01-26 19:08:46,417] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-01-26 19:08:46,442] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-01-26 19:08:46,464] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-01-26 19:08:46,542] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-01-26 19:08:46,575] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-01-26 19:08:46,583] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-01-26 19:08:46,659] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-01-26 19:08:46,659] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-01-26 19:08:46,772] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-01-26 19:08:46,821] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-01-26 19:08:46,854] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-01-26 19:08:47,018] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-01-26 19:08:47,034] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-01-26 19:08:47,162] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-01-26 19:08:47,218] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-01-26 19:08:47,283] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-01-26 19:08:47,287] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-01-26 19:08:47,355] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-01-26 19:08:47,529] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-01-26 19:08:47,548] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-01-26 19:08:47,746] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-01-26 19:08:47,804] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-01-26 19:08:47,805] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-01-26 19:08:47,846] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-01-26 19:08:47,905] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-01-26 19:08:47,942] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-01-26 19:08:48,004] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-01-26 19:08:48,032] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-01-26 19:08:48,096] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-01-26 19:08:48,161] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-01-26 19:08:48,163] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-01-26 19:08:48,169] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-01-26 19:08:48,247] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-01-26 19:08:48,374] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-01-26 19:08:48,398] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-01-26 19:08:48,481] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-01-26 19:08:48,513] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-01-26 19:08:48,571] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-01-26 19:08:48,578] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-01-26 19:08:48,669] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-01-26 19:08:48,847] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-01-26 19:08:48,867] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-01-26 19:08:48,954] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-01-26 19:08:49,120] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-01-26 19:08:49,142] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-01-26 19:08:49,240] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-01-26 19:08:49,290] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-01-26 19:08:49,303] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-01-26 19:08:49,331] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-01-26 19:08:49,392] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-01-26 19:08:49,399] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-01-26 19:08:49,512] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-01-26 19:08:49,601] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-01-26 19:08:49,622] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-01-26 19:08:49,819] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-01-26 19:08:49,881] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-01-26 19:08:49,960] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-01-26 19:08:49,984] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-01-26 19:08:49,984] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-01-26 19:08:50,065] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-01-26 19:08:50,191] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-01-26 19:08:50,208] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-01-26 19:08:50,252] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-01-26 19:08:50,296] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-01-26 19:08:50,310] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-01-26 19:08:50,361] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-01-26 19:08:50,475] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-01-26 19:08:50,571] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-01-26 19:08:50,617] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-01-26 19:08:50,639] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-01-26 19:08:50,650] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-01-26 19:08:50,677] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-01-26 19:08:50,686] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-01-26 19:08:50,903] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-01-26 19:08:50,999] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-01-26 19:08:51,054] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-01-26 19:08:51,071] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-01-26 19:08:51,085] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-01-26 19:08:51,359] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-01-26 19:08:51,397] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-01-26 19:08:51,446] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-01-26 19:08:51,541] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-01-26 19:08:51,556] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-01-26 19:08:51,795] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-01-26 19:08:51,814] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-01-26 19:08:51,869] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-01-26 19:08:52,024] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-01-26 19:08:52,176] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-01-26 19:08:52,247] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-01-26 19:08:52,402] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-01-26 19:08:52,537] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-01-26 19:08:52,572] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-01-26 19:08:52,776] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 [2022-01-26 19:08:52,918] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-01-26 19:08:52,929] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-01-26 19:08:52,930] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-01-26 19:08:52,978] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-01-26 19:08:53,699] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-01-26 19:08:53,707] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-01-26 19:08:53,845] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-01-26 19:08:53,919] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-01-26 19:08:54,280] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-01-26 19:08:54,351] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-01-26 19:08:54,366] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-01-26 19:08:54,583] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 [2022-01-26 19:08:55,100] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-01-26 19:08:55,109] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-01-26 19:08:55,305] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-01-26 19:08:55,499] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-01-26 19:08:55,734] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-01-26 19:08:56,679] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-01-26 19:08:56,801] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-01-26 19:08:57,154] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 15000 time (ms) | load-checkpoint: 45190.62 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-01-26 19:08:57 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.055566 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.174 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.147 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.071 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-01-26 19:09:04 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 56493.91 | train/valid/test-data-iterators-setup: 6885.90 [003-001] 103.3651B / 103.3651B[002-001] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B [002-000] 125.2243B / 103.3681B [002-030] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B[003-017] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B [002-009] 103.3651B / 103.3651B[002-008] 103.3651B / 103.3651B[003-008] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [001-001] 103.3651B / 103.3651B[003-000] 125.2243B / 103.3681B [003-010] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B[003-020] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B [003-028] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B[001-004] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [002-007] 103.3651B / 103.3651B[003-006] 103.3651B / 103.3651B[001-006] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B [001-015] 103.3651B / 103.3651B[001-014] 103.3651B / 103.3651B [001-022] 103.3651B / 103.3651B[001-023] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B [002-024] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B[001-016] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B [002-002] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B[003-026] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B[002-020] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [002-031] 125.2273B / 103.3710B [002-028] 103.3651B / 103.3651B[001-029] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B [002-013] 103.3651B / 103.3651B[002-012] 103.3651B / 103.3651B[001-013] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [002-015] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [003-023] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B[002-018] 103.3651B / 103.3651B[002-019] 103.3651B / 103.3651B[001-018] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B[003-025] 103.3651B / 103.3651B[003-024] 103.3651B / 103.3651B[001-024] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B [001-009] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B[003-002] 103.3651B / 103.3651B[001-003] 103.3651B / 103.3651B[003-003] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B [003-011] 103.3651B / 103.3651B [003-027] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B [003-030] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B[003-029] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B [003-012] 103.3651B / 103.3651B [002-005] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B[000-017] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [001-010] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [002-029] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B[000-007] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B [000-029] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-008] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-018] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-01-26 19:09:04 [2022-01-26 19:09:04,852] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-01-26 19:09:04,852] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-01-26 19:09:04,852] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-01-26 19:09:04,852] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-01-26 19:09:04,852] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 0] (after 15001 iterations) memory (MB) | allocated: 13222.74267578125 | max allocated: 20686.35888671875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 4] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 124] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20702.9326171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 3] (after 15001 iterations) memory (MB) | allocated: 13228.71923828125 | max allocated: 20692.33544921875 | reserved: 24404.0 | max reserved: 24404.0[Rank 7] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 8] (after 15001 iterations) memory (MB) | allocated: 10796.8388671875 | max allocated: 16957.0205078125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 1] (after 15001 iterations) memory (MB) | allocated: 13222.74267578125 | max allocated: 20686.35888671875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 125] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20702.9326171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 5] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 iteration 15001/ 292968 | consumed samples: 30722048 | consumed tokens: 14543749120 | elapsed time per iteration (ms): 231601.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.695227E+00 | loss scale: 262144.0 | grad norm: 172370.614 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.009 | TFLOPs: 50.73 | [Rank 6] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16957.3212890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 15001 iterations) memory (MB) | allocated: 13223.984375 | max allocated: 20687.6005859375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 127] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20702.9326171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 126] (after 15001 iterations) memory (MB) | allocated: 13239.271484375 | max allocated: 20703.6201171875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 11] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 10] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 24] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 52] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 15001 iterations) memory (MB) | allocated: 10796.8388671875 | max allocated: 16957.0205078125 | reserved: 20072.0 | max reserved: 20072.0[Rank 20] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 9] (after 15001 iterations) memory (MB) | allocated: 10796.8388671875 | max allocated: 16957.0205078125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 13] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 35] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 59] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 51] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 123] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0[Rank 34] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 66] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 58] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 94] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 86] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 15001 iterations) memory (MB) | allocated: 10796.8388671875 | max allocated: 16957.0205078125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 118] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 122] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 25] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 49] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 15001 iterations) memory (MB) | allocated: 10796.8388671875 | max allocated: 16957.0205078125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 15001 iterations) memory (MB) | allocated: 10796.22998046875 | max allocated: 16956.41162109375 | reserved: 20072.0 | max reserved: 20072.0 iteration 15002/ 292968 | consumed samples: 30724096 | consumed tokens: 14545567744 | elapsed time per iteration (ms): 158765.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.734951E+00 | loss scale: 262144.0 | grad norm: 187802.534 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 74.01 | iteration 15003/ 292968 | consumed samples: 30726144 | consumed tokens: 14547386368 | elapsed time per iteration (ms): 159239.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.717034E+00 | loss scale: 262144.0 | grad norm: 198015.959 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 73.79 | iteration 15004/ 292968 | consumed samples: 30728192 | consumed tokens: 14549204992 | elapsed time per iteration (ms): 152052.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.705877E+00 | loss scale: 262144.0 | grad norm: 188953.119 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 77.27 | iteration 15005/ 292968 | consumed samples: 30730240 | consumed tokens: 14551023616 | elapsed time per iteration (ms): 148800.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.702031E+00 | loss scale: 262144.0 | grad norm: 162012.634 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 78.96 | iteration 15006/ 292968 | consumed samples: 30732288 | consumed tokens: 14552842240 | elapsed time per iteration (ms): 145447.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.716557E+00 | loss scale: 262144.0 | grad norm: 159805.163 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 80.78 | iteration 15007/ 292968 | consumed samples: 30734336 | consumed tokens: 14554660864 | elapsed time per iteration (ms): 140715.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.697146E+00 | loss scale: 262144.0 | grad norm: 188602.403 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 83.50 | iteration 15008/ 292968 | consumed samples: 30736384 | consumed tokens: 14556479488 | elapsed time per iteration (ms): 142624.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.711441E+00 | loss scale: 262144.0 | grad norm: 186407.067 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.38 | iteration 15009/ 292968 | consumed samples: 30738432 | consumed tokens: 14558298112 | elapsed time per iteration (ms): 139879.4 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.712217E+00 | loss scale: 262144.0 | grad norm: 103740.073 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 84.00 | iteration 15010/ 292968 | consumed samples: 30740480 | consumed tokens: 14560116736 | elapsed time per iteration (ms): 135647.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.710491E+00 | loss scale: 262144.0 | grad norm: 110161.712 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.62 | iteration 15011/ 292968 | consumed samples: 30742528 | consumed tokens: 14561935360 | elapsed time per iteration (ms): 136901.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.682187E+00 | loss scale: 262144.0 | grad norm: 156679.488 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 85.82 | iteration 15012/ 292968 | consumed samples: 30744576 | consumed tokens: 14563753984 | elapsed time per iteration (ms): 135584.0 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.717501E+00 | loss scale: 262144.0 | grad norm: 151335.476 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.66 | iteration 15013/ 292968 | consumed samples: 30746624 | consumed tokens: 14565572608 | elapsed time per iteration (ms): 142650.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.713209E+00 | loss scale: 262144.0 | grad norm: 127339.714 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.37 | iteration 15014/ 292968 | consumed samples: 30748672 | consumed tokens: 14567391232 | elapsed time per iteration (ms): 139120.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.715857E+00 | loss scale: 262144.0 | grad norm: 102326.877 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 84.46 | iteration 15015/ 292968 | consumed samples: 30750720 | consumed tokens: 14569209856 | elapsed time per iteration (ms): 136610.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.679760E+00 | loss scale: 262144.0 | grad norm: 133019.068 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.01 | iteration 15016/ 292968 | consumed samples: 30752768 | consumed tokens: 14571028480 | elapsed time per iteration (ms): 137971.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.711917E+00 | loss scale: 262144.0 | grad norm: 173765.843 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 85.16 | iteration 15017/ 292968 | consumed samples: 30754816 | consumed tokens: 14572847104 | elapsed time per iteration (ms): 138098.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.709079E+00 | loss scale: 262144.0 | grad norm: 186889.653 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 85.08 | iteration 15018/ 292968 | consumed samples: 30756864 | consumed tokens: 14574665728 | elapsed time per iteration (ms): 137482.5 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.697008E+00 | loss scale: 262144.0 | grad norm: 206104.323 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 85.46 | iteration 15019/ 292968 | consumed samples: 30758912 | consumed tokens: 14576484352 | elapsed time per iteration (ms): 133273.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.678800E+00 | loss scale: 262144.0 | grad norm: 221155.598 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.16 | iteration 15020/ 292968 | consumed samples: 30760960 | consumed tokens: 14578302976 | elapsed time per iteration (ms): 135868.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.706548E+00 | loss scale: 262144.0 | grad norm: 214749.201 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.48 | iteration 15021/ 292968 | consumed samples: 30763008 | consumed tokens: 14580121600 | elapsed time per iteration (ms): 133048.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.712150E+00 | loss scale: 262144.0 | grad norm: 179140.736 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.31 | iteration 15022/ 292968 | consumed samples: 30765056 | consumed tokens: 14581940224 | elapsed time per iteration (ms): 131521.4 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.718051E+00 | loss scale: 262144.0 | grad norm: 129558.939 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 89.34 | iteration 15023/ 292968 | consumed samples: 30767104 | consumed tokens: 14583758848 | elapsed time per iteration (ms): 132134.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.727764E+00 | loss scale: 262144.0 | grad norm: 182257.622 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.92 | iteration 15024/ 292968 | consumed samples: 30769152 | consumed tokens: 14585577472 | elapsed time per iteration (ms): 139764.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.724793E+00 | loss scale: 262144.0 | grad norm: 247993.789 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 84.07 | iteration 15025/ 292968 | consumed samples: 30771200 | consumed tokens: 14587396096 | elapsed time per iteration (ms): 126000.4 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.726327E+00 | loss scale: 262144.0 | grad norm: 210116.770 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.25 | iteration 15026/ 292968 | consumed samples: 30773248 | consumed tokens: 14589214720 | elapsed time per iteration (ms): 127569.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.745291E+00 | loss scale: 262144.0 | grad norm: 138028.953 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.10 | iteration 15027/ 292968 | consumed samples: 30775296 | consumed tokens: 14591033344 | elapsed time per iteration (ms): 127809.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.719173E+00 | loss scale: 262144.0 | grad norm: 150654.474 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.93 | iteration 15028/ 292968 | consumed samples: 30777344 | consumed tokens: 14592851968 | elapsed time per iteration (ms): 127200.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.756075E+00 | loss scale: 262144.0 | grad norm: 146127.648 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.37 | iteration 15029/ 292968 | consumed samples: 30779392 | consumed tokens: 14594670592 | elapsed time per iteration (ms): 126549.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.725291E+00 | loss scale: 262144.0 | grad norm: 147069.473 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.85 | iteration 15030/ 292968 | consumed samples: 30781440 | consumed tokens: 14596489216 | elapsed time per iteration (ms): 123446.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.736372E+00 | loss scale: 262144.0 | grad norm: 157563.251 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.18 | iteration 15031/ 292968 | consumed samples: 30783488 | consumed tokens: 14598307840 | elapsed time per iteration (ms): 124380.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.687201E+00 | loss scale: 262144.0 | grad norm: 133271.862 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.46 | iteration 15032/ 292968 | consumed samples: 30785536 | consumed tokens: 14600126464 | elapsed time per iteration (ms): 120751.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.711807E+00 | loss scale: 262144.0 | grad norm: 104047.159 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.30 | iteration 15033/ 292968 | consumed samples: 30787584 | consumed tokens: 14601945088 | elapsed time per iteration (ms): 122909.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.715228E+00 | loss scale: 262144.0 | grad norm: 111510.385 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.60 | iteration 15034/ 292968 | consumed samples: 30789632 | consumed tokens: 14603763712 | elapsed time per iteration (ms): 123194.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.731513E+00 | loss scale: 262144.0 | grad norm: 144650.794 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.37 | iteration 15035/ 292968 | consumed samples: 30791680 | consumed tokens: 14605582336 | elapsed time per iteration (ms): 127359.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.749392E+00 | loss scale: 262144.0 | grad norm: 202132.283 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.26 | iteration 15036/ 292968 | consumed samples: 30793728 | consumed tokens: 14607400960 | elapsed time per iteration (ms): 123310.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.758457E+00 | loss scale: 262144.0 | grad norm: 278257.728 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.28 | iteration 15037/ 292968 | consumed samples: 30795776 | consumed tokens: 14609219584 | elapsed time per iteration (ms): 122398.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.726472E+00 | loss scale: 262144.0 | grad norm: 233311.549 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.99 | iteration 15038/ 292968 | consumed samples: 30797824 | consumed tokens: 14611038208 | elapsed time per iteration (ms): 119390.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.702294E+00 | loss scale: 262144.0 | grad norm: 208700.628 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.41 | iteration 15039/ 292968 | consumed samples: 30799872 | consumed tokens: 14612856832 | elapsed time per iteration (ms): 127743.4 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.759710E+00 | loss scale: 262144.0 | grad norm: 229511.208 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.98 | iteration 15040/ 292968 | consumed samples: 30801920 | consumed tokens: 14614675456 | elapsed time per iteration (ms): 123797.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.683158E+00 | loss scale: 262144.0 | grad norm: 194255.396 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 94.91 | iteration 15041/ 292968 | consumed samples: 30803968 | consumed tokens: 14616494080 | elapsed time per iteration (ms): 119538.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.742655E+00 | loss scale: 262144.0 | grad norm: 187524.500 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.29 | iteration 15042/ 292968 | consumed samples: 30806016 | consumed tokens: 14618312704 | elapsed time per iteration (ms): 122847.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.730507E+00 | loss scale: 262144.0 | grad norm: 180661.324 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.64 | iteration 15043/ 292968 | consumed samples: 30808064 | consumed tokens: 14620131328 | elapsed time per iteration (ms): 122518.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.719227E+00 | loss scale: 262144.0 | grad norm: 229735.775 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.90 | iteration 15044/ 292968 | consumed samples: 30810112 | consumed tokens: 14621949952 | elapsed time per iteration (ms): 123755.5 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.708852E+00 | loss scale: 262144.0 | grad norm: 277670.911 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 94.94 | iteration 15045/ 292968 | consumed samples: 30812160 | consumed tokens: 14623768576 | elapsed time per iteration (ms): 125182.4 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.703521E+00 | loss scale: 262144.0 | grad norm: 178497.782 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.86 | iteration 15046/ 292968 | consumed samples: 30814208 | consumed tokens: 14625587200 | elapsed time per iteration (ms): 124813.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.706630E+00 | loss scale: 262144.0 | grad norm: 158797.400 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.14 | iteration 15047/ 292968 | consumed samples: 30816256 | consumed tokens: 14627405824 | elapsed time per iteration (ms): 124970.0 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.723911E+00 | loss scale: 262144.0 | grad norm: 127589.577 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.02 | iteration 15048/ 292968 | consumed samples: 30818304 | consumed tokens: 14629224448 | elapsed time per iteration (ms): 125489.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.702717E+00 | loss scale: 262144.0 | grad norm: 148022.964 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.63 | iteration 15049/ 292968 | consumed samples: 30820352 | consumed tokens: 14631043072 | elapsed time per iteration (ms): 123838.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.728228E+00 | loss scale: 262144.0 | grad norm: 144180.805 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 94.88 | iteration 15050/ 292968 | consumed samples: 30822400 | consumed tokens: 14632861696 | elapsed time per iteration (ms): 124439.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.710845E+00 | loss scale: 262144.0 | grad norm: 121193.215 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.42 | saving checkpoint at iteration 15050 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-26 21:00:45,826] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/mp_rank_00_model_states.pt [2022-01-26 21:00:46,202] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/mp_rank_01_model_states.pt [2022-01-26 21:00:59,250] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-26 21:00:59,703] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-26 21:01:02,123] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-26 21:01:02,854] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-26 21:01:02,957] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-26 21:01:02,984] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-26 21:01:03,009] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-26 21:01:03,087] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-26 21:01:07,680] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-26 21:01:08,684] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-26 21:01:08,797] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-26 21:01:09,932] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-26 21:01:10,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-26 21:01:10,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-26 21:01:11,094] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-26 21:01:11,706] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-26 21:01:11,986] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-26 21:01:12,031] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-26 21:01:12,042] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-26 21:01:12,074] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-26 21:01:12,109] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-26 21:01:12,167] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-26 21:01:12,188] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-26 21:01:12,215] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-26 21:01:12,243] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-26 21:01:12,318] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-26 21:01:12,327] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-26 21:01:12,423] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-26 21:01:12,553] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-26 21:01:12,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-26 21:01:12,684] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-26 21:01:12,768] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-26 21:01:12,856] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-26 21:01:12,871] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-26 21:01:12,960] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-26 21:01:13,024] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-26 21:01:13,423] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-26 21:01:13,476] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-26 21:01:13,562] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-26 21:01:13,594] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-26 21:01:13,907] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-26 21:01:13,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-26 21:01:13,905] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-26 21:01:14,006] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-26 21:01:14,051] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-26 21:01:14,073] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-26 21:01:14,061] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-26 21:01:14,065] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-26 21:01:14,278] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-26 21:01:14,528] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-26 21:01:14,677] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-26 21:01:14,863] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-26 21:01:14,868] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-26 21:01:14,853] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-26 21:01:14,901] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-26 21:01:14,871] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-26 21:01:15,055] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-26 21:01:15,168] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-26 21:01:15,293] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-26 21:01:15,314] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-26 21:01:15,310] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-26 21:01:15,366] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-26 21:01:15,404] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-26 21:01:15,747] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-26 21:01:15,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-26 21:01:15,879] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-26 21:01:16,241] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-26 21:01:16,361] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-26 21:01:16,404] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-26 21:01:16,543] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-26 21:01:16,612] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-26 21:01:16,788] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-26 21:01:16,828] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-26 21:01:16,788] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-26 21:01:16,886] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-26 21:01:16,894] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-26 21:01:16,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-26 21:01:17,008] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-26 21:01:17,049] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-26 21:01:17,187] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-26 21:01:17,203] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-26 21:01:17,221] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-26 21:01:17,226] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-26 21:01:17,243] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-26 21:01:17,269] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-26 21:01:17,268] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-26 21:01:17,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-26 21:01:17,403] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-26 21:01:17,561] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-26 21:01:17,576] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-26 21:01:17,604] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-26 21:01:17,665] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-26 21:01:17,610] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-26 21:01:17,656] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-26 21:01:17,752] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-26 21:01:17,799] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-26 21:01:17,854] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-26 21:01:17,769] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-26 21:01:18,021] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-26 21:01:18,187] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-26 21:01:18,328] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-26 21:01:18,410] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-26 21:01:18,551] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-26 21:01:18,709] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-26 21:01:18,758] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-26 21:01:19,378] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-26 21:01:19,834] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-26 21:01:19,931] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-26 21:01:19,932] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-26 21:01:19,989] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-26 21:01:20,670] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-26 21:01:20,660] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-26 21:01:21,288] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-26 21:01:23,355] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-26 21:01:23,535] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-26 21:01:23,834] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-26 21:01:26,139] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-26 21:01:26,147] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-26 21:01:26,250] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-26 21:01:26,289] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-26 21:01:26,318] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-26 21:01:26,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-26 21:01:27,583] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-26 21:01:27,594] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-26 21:01:32,696] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-26 21:01:32,701] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-26 21:01:37,218] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-26 21:01:37,281] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15050/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 15050 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 56425.33 iteration 15051/ 292968 | consumed samples: 30824448 | consumed tokens: 14634680320 | elapsed time per iteration (ms): 206194.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.721343E+00 | loss scale: 262144.0 | grad norm: 121920.799 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 56.98 | iteration 15052/ 292968 | consumed samples: 30826496 | consumed tokens: 14636498944 | elapsed time per iteration (ms): 140111.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.733590E+00 | loss scale: 262144.0 | grad norm: 156930.054 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 83.86 | iteration 15053/ 292968 | consumed samples: 30828544 | consumed tokens: 14638317568 | elapsed time per iteration (ms): 143262.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.700775E+00 | loss scale: 262144.0 | grad norm: 152528.468 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.01 | iteration 15054/ 292968 | consumed samples: 30830592 | consumed tokens: 14640136192 | elapsed time per iteration (ms): 144950.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.733440E+00 | loss scale: 262144.0 | grad norm: 114917.748 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.06 | iteration 15055/ 292968 | consumed samples: 30832640 | consumed tokens: 14641954816 | elapsed time per iteration (ms): 144797.0 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.716497E+00 | loss scale: 262144.0 | grad norm: 108427.819 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.15 | iteration 15056/ 292968 | consumed samples: 30834688 | consumed tokens: 14643773440 | elapsed time per iteration (ms): 144819.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.734305E+00 | loss scale: 262144.0 | grad norm: 140552.338 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.13 | iteration 15057/ 292968 | consumed samples: 30836736 | consumed tokens: 14645592064 | elapsed time per iteration (ms): 140297.0 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.705750E+00 | loss scale: 262144.0 | grad norm: 188827.255 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 83.75 | iteration 15058/ 292968 | consumed samples: 30838784 | consumed tokens: 14647410688 | elapsed time per iteration (ms): 144221.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.715857E+00 | loss scale: 262144.0 | grad norm: 224964.568 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.47 | iteration 15059/ 292968 | consumed samples: 30840832 | consumed tokens: 14649229312 | elapsed time per iteration (ms): 144130.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.714161E+00 | loss scale: 262144.0 | grad norm: 258383.503 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.52 | iteration 15060/ 292968 | consumed samples: 30842880 | consumed tokens: 14651047936 | elapsed time per iteration (ms): 143417.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.720926E+00 | loss scale: 262144.0 | grad norm: 241269.540 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.93 | iteration 15061/ 292968 | consumed samples: 30844928 | consumed tokens: 14652866560 | elapsed time per iteration (ms): 143706.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.714535E+00 | loss scale: 262144.0 | grad norm: 202399.583 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.76 | iteration 15062/ 292968 | consumed samples: 30846976 | consumed tokens: 14654685184 | elapsed time per iteration (ms): 142451.5 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.722237E+00 | loss scale: 262144.0 | grad norm: 191846.583 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.48 | iteration 15063/ 292968 | consumed samples: 30849024 | consumed tokens: 14656503808 | elapsed time per iteration (ms): 147147.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.724993E+00 | loss scale: 262144.0 | grad norm: 226460.865 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 79.85 | iteration 15064/ 292968 | consumed samples: 30851072 | consumed tokens: 14658322432 | elapsed time per iteration (ms): 148134.4 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.710265E+00 | loss scale: 262144.0 | grad norm: 228480.954 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 79.32 | iteration 15065/ 292968 | consumed samples: 30853120 | consumed tokens: 14660141056 | elapsed time per iteration (ms): 152809.0 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.712543E+00 | loss scale: 262144.0 | grad norm: 170345.291 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 76.89 | iteration 15066/ 292968 | consumed samples: 30855168 | consumed tokens: 14661959680 | elapsed time per iteration (ms): 153298.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.722568E+00 | loss scale: 262144.0 | grad norm: 193284.572 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 76.64 | iteration 15067/ 292968 | consumed samples: 30857216 | consumed tokens: 14663778304 | elapsed time per iteration (ms): 154939.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.707795E+00 | loss scale: 262144.0 | grad norm: 198670.800 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 75.83 | iteration 15068/ 292968 | consumed samples: 30859264 | consumed tokens: 14665596928 | elapsed time per iteration (ms): 151555.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.718195E+00 | loss scale: 262144.0 | grad norm: 177024.175 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 77.53 | iteration 15069/ 292968 | consumed samples: 30861312 | consumed tokens: 14667415552 | elapsed time per iteration (ms): 146786.0 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.720861E+00 | loss scale: 262144.0 | grad norm: 157300.441 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 80.05 | iteration 15070/ 292968 | consumed samples: 30863360 | consumed tokens: 14669234176 | elapsed time per iteration (ms): 144140.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.719362E+00 | loss scale: 262144.0 | grad norm: 152242.214 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.51 | iteration 15071/ 292968 | consumed samples: 30865408 | consumed tokens: 14671052800 | elapsed time per iteration (ms): 141576.4 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.744251E+00 | loss scale: 262144.0 | grad norm: 203446.207 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.99 | iteration 15072/ 292968 | consumed samples: 30867456 | consumed tokens: 14672871424 | elapsed time per iteration (ms): 143572.7 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.714924E+00 | loss scale: 262144.0 | grad norm: 200091.984 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.84 | iteration 15073/ 292968 | consumed samples: 30869504 | consumed tokens: 14674690048 | elapsed time per iteration (ms): 142622.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.762680E+00 | loss scale: 262144.0 | grad norm: 189698.706 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.38 | iteration 15074/ 292968 | consumed samples: 30871552 | consumed tokens: 14676508672 | elapsed time per iteration (ms): 141205.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.760294E+00 | loss scale: 262144.0 | grad norm: 152052.174 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 83.21 | iteration 15075/ 292968 | consumed samples: 30873600 | consumed tokens: 14678327296 | elapsed time per iteration (ms): 141334.3 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.784089E+00 | loss scale: 262144.0 | grad norm: 250317.687 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 83.13 | iteration 15076/ 292968 | consumed samples: 30875648 | consumed tokens: 14680145920 | elapsed time per iteration (ms): 140645.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.749587E+00 | loss scale: 262144.0 | grad norm: 284976.603 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 83.54 | iteration 15077/ 292968 | consumed samples: 30877696 | consumed tokens: 14681964544 | elapsed time per iteration (ms): 138570.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.733856E+00 | loss scale: 262144.0 | grad norm: 164303.338 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 84.79 | iteration 15078/ 292968 | consumed samples: 30879744 | consumed tokens: 14683783168 | elapsed time per iteration (ms): 141315.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.730196E+00 | loss scale: 262144.0 | grad norm: 217560.016 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 83.14 | iteration 15079/ 292968 | consumed samples: 30881792 | consumed tokens: 14685601792 | elapsed time per iteration (ms): 134624.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.713011E+00 | loss scale: 262144.0 | grad norm: 149968.575 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 87.28 | iteration 15080/ 292968 | consumed samples: 30883840 | consumed tokens: 14687420416 | elapsed time per iteration (ms): 134420.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.724907E+00 | loss scale: 262144.0 | grad norm: 154894.340 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 87.41 | iteration 15081/ 292968 | consumed samples: 30885888 | consumed tokens: 14689239040 | elapsed time per iteration (ms): 133954.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.707349E+00 | loss scale: 262144.0 | grad norm: 161179.381 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 87.71 | iteration 15082/ 292968 | consumed samples: 30887936 | consumed tokens: 14691057664 | elapsed time per iteration (ms): 135488.1 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.715438E+00 | loss scale: 262144.0 | grad norm: 85067.271 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.72 | iteration 15083/ 292968 | consumed samples: 30889984 | consumed tokens: 14692876288 | elapsed time per iteration (ms): 136209.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.735508E+00 | loss scale: 262144.0 | grad norm: 131648.458 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.26 | iteration 15084/ 292968 | consumed samples: 30892032 | consumed tokens: 14694694912 | elapsed time per iteration (ms): 133536.2 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.744684E+00 | loss scale: 262144.0 | grad norm: 167271.530 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 87.99 | iteration 15085/ 292968 | consumed samples: 30894080 | consumed tokens: 14696513536 | elapsed time per iteration (ms): 136734.0 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.708726E+00 | loss scale: 262144.0 | grad norm: 122635.980 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 85.93 | iteration 15086/ 292968 | consumed samples: 30896128 | consumed tokens: 14698332160 | elapsed time per iteration (ms): 142313.6 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.698088E+00 | loss scale: 262144.0 | grad norm: 134885.215 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.56 | iteration 15087/ 292968 | consumed samples: 30898176 | consumed tokens: 14700150784 | elapsed time per iteration (ms): 131633.8 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.737584E+00 | loss scale: 262144.0 | grad norm: 125399.810 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 89.26 | iteration 15088/ 292968 | consumed samples: 30900224 | consumed tokens: 14701969408 | elapsed time per iteration (ms): 132258.5 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.691245E+00 | loss scale: 262144.0 | grad norm: 92128.484 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.84 | iteration 15089/ 292968 | consumed samples: 30902272 | consumed tokens: 14703788032 | elapsed time per iteration (ms): 128143.9 | learning rate: 5.958E-05 | global batch size: 2048 | lm loss: 2.707133E+00 | loss scale: 262144.0 | grad norm: 121463.998 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.69 | iteration 15090/ 292968 | consumed samples: 30904320 | consumed tokens: 14705606656 | elapsed time per iteration (ms): 125401.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706694E+00 | loss scale: 262144.0 | grad norm: 151373.013 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.70 | iteration 15091/ 292968 | consumed samples: 30906368 | consumed tokens: 14707425280 | elapsed time per iteration (ms): 129300.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.707309E+00 | loss scale: 262144.0 | grad norm: 176859.999 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 90.87 | iteration 15092/ 292968 | consumed samples: 30908416 | consumed tokens: 14709243904 | elapsed time per iteration (ms): 127329.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.713504E+00 | loss scale: 262144.0 | grad norm: 213939.902 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.28 | iteration 15093/ 292968 | consumed samples: 30910464 | consumed tokens: 14711062528 | elapsed time per iteration (ms): 126767.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.704916E+00 | loss scale: 262144.0 | grad norm: 230817.690 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.69 | iteration 15094/ 292968 | consumed samples: 30912512 | consumed tokens: 14712881152 | elapsed time per iteration (ms): 125337.4 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.698747E+00 | loss scale: 262144.0 | grad norm: 210929.025 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.74 | iteration 15095/ 292968 | consumed samples: 30914560 | consumed tokens: 14714699776 | elapsed time per iteration (ms): 127794.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.712384E+00 | loss scale: 262144.0 | grad norm: 215797.079 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.94 | iteration 15096/ 292968 | consumed samples: 30916608 | consumed tokens: 14716518400 | elapsed time per iteration (ms): 127364.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.712807E+00 | loss scale: 262144.0 | grad norm: 253016.337 | num zeros: 0.0 | curriculum seqlen: 888 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.25 | iteration 15097/ 292968 | consumed samples: 30918656 | consumed tokens: 14718353408 | elapsed time per iteration (ms): 128336.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.723367E+00 | loss scale: 262144.0 | grad norm: 233540.562 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.38 | iteration 15098/ 292968 | consumed samples: 30920704 | consumed tokens: 14720188416 | elapsed time per iteration (ms): 133677.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.716611E+00 | loss scale: 262144.0 | grad norm: 205371.123 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.69 | iteration 15099/ 292968 | consumed samples: 30922752 | consumed tokens: 14722023424 | elapsed time per iteration (ms): 128734.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706893E+00 | loss scale: 262144.0 | grad norm: 221306.617 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.09 | iteration 15100/ 292968 | consumed samples: 30924800 | consumed tokens: 14723858432 | elapsed time per iteration (ms): 123186.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.710426E+00 | loss scale: 262144.0 | grad norm: 181411.446 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.24 | saving checkpoint at iteration 15100 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-26 22:57:20,272] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/mp_rank_01_model_states.pt [2022-01-26 22:57:20,335] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/mp_rank_00_model_states.pt [2022-01-26 22:57:34,056] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-26 22:57:34,442] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-26 22:57:34,627] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-26 22:57:35,220] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-26 22:57:36,346] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-26 22:57:36,370] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-26 22:57:36,823] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-26 22:57:36,845] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-26 22:57:38,309] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-26 22:57:41,375] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-26 22:57:41,871] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-26 22:57:43,111] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-26 22:57:43,735] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-26 22:57:43,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-26 22:57:43,836] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-26 22:57:44,512] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-26 22:57:44,581] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-26 22:57:45,037] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-26 22:57:45,136] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-26 22:57:45,832] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-26 22:57:45,902] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-26 22:57:46,212] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-26 22:57:46,235] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-26 22:57:46,390] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-26 22:57:46,530] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-26 22:57:46,688] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-26 22:57:46,889] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-26 22:57:46,920] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-26 22:57:46,907] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-26 22:57:47,182] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-26 22:57:47,198] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-26 22:57:47,207] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-26 22:57:47,324] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-26 22:57:47,468] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-26 22:57:47,768] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-26 22:57:47,856] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-26 22:57:47,951] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-26 22:57:48,218] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-26 22:57:48,172] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-26 22:57:48,252] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-26 22:57:48,368] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-26 22:57:48,405] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-26 22:57:48,400] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-26 22:57:48,475] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-26 22:57:48,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-26 22:57:48,628] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-26 22:57:48,689] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-26 22:57:48,731] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-26 22:57:48,760] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-26 22:57:48,798] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-26 22:57:48,837] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-26 22:57:48,836] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-26 22:57:48,870] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-26 22:57:49,073] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-26 22:57:49,147] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-26 22:57:49,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-26 22:57:49,271] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-26 22:57:49,431] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-26 22:57:49,549] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-26 22:57:49,765] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-26 22:57:49,776] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-26 22:57:49,934] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-26 22:57:49,934] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-26 22:57:49,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-26 22:57:50,017] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-26 22:57:50,046] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-26 22:57:50,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-26 22:57:50,173] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-26 22:57:50,294] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-26 22:57:50,353] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-26 22:57:50,375] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-26 22:57:50,394] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-26 22:57:50,407] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-26 22:57:50,575] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-26 22:57:50,702] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-26 22:57:50,758] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-26 22:57:50,895] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-26 22:57:50,983] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-26 22:57:51,057] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-26 22:57:51,139] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-26 22:57:51,152] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-26 22:57:51,185] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-26 22:57:51,357] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-26 22:57:51,393] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-26 22:57:51,424] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-26 22:57:51,945] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-26 22:57:52,012] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-26 22:57:52,113] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-26 22:57:52,120] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-26 22:57:52,637] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-26 22:57:52,720] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-26 22:57:52,983] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-26 22:57:53,019] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-26 22:57:53,030] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-26 22:57:53,126] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-26 22:57:53,215] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-26 22:57:53,222] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-26 22:57:53,305] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-26 22:57:53,421] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-26 22:57:53,455] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-26 22:57:53,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-26 22:57:53,463] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-26 22:57:53,539] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-26 22:57:53,728] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-26 22:57:54,022] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-26 22:57:54,076] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-26 22:57:53,964] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-26 22:57:54,019] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-26 22:57:54,378] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-26 22:57:54,455] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-26 22:57:54,491] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-26 22:57:54,562] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-26 22:57:54,572] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-26 22:57:54,657] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-26 22:57:54,715] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-26 22:57:55,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-26 22:57:55,473] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-26 22:57:55,545] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-26 22:57:55,635] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-26 22:57:56,540] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-26 22:57:56,985] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-26 22:57:57,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-26 22:57:57,058] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-26 22:57:57,262] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-26 22:57:57,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-26 22:57:58,047] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-26 22:57:59,362] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-26 22:57:59,516] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15100/zero_pp_rank_0_mp_rank_66_optim_states.pt successfully saved checkpoint at iteration 15100 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 44095.95 iteration 15101/ 292968 | consumed samples: 30926848 | consumed tokens: 14725693440 | elapsed time per iteration (ms): 167326.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.699902E+00 | loss scale: 262144.0 | grad norm: 214810.508 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 70.85 | iteration 15102/ 292968 | consumed samples: 30928896 | consumed tokens: 14727528448 | elapsed time per iteration (ms): 118075.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706147E+00 | loss scale: 262144.0 | grad norm: 199274.466 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.41 | iteration 15103/ 292968 | consumed samples: 30930944 | consumed tokens: 14729363456 | elapsed time per iteration (ms): 120450.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.711918E+00 | loss scale: 262144.0 | grad norm: 143104.282 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.43 | iteration 15104/ 292968 | consumed samples: 30932992 | consumed tokens: 14731198464 | elapsed time per iteration (ms): 117238.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.680527E+00 | loss scale: 262144.0 | grad norm: 231508.942 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.12 | iteration 15105/ 292968 | consumed samples: 30935040 | consumed tokens: 14733033472 | elapsed time per iteration (ms): 115589.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.694654E+00 | loss scale: 262144.0 | grad norm: 156207.088 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.56 | iteration 15106/ 292968 | consumed samples: 30937088 | consumed tokens: 14734868480 | elapsed time per iteration (ms): 114056.1 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.687599E+00 | loss scale: 262144.0 | grad norm: 168107.709 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 103.94 | iteration 15107/ 292968 | consumed samples: 30939136 | consumed tokens: 14736703488 | elapsed time per iteration (ms): 113081.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.685125E+00 | loss scale: 262144.0 | grad norm: 173324.755 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 104.84 | iteration 15108/ 292968 | consumed samples: 30941184 | consumed tokens: 14738538496 | elapsed time per iteration (ms): 112507.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.726460E+00 | loss scale: 262144.0 | grad norm: 129537.640 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 105.37 | iteration 15109/ 292968 | consumed samples: 30943232 | consumed tokens: 14740373504 | elapsed time per iteration (ms): 114697.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.690893E+00 | loss scale: 262144.0 | grad norm: 152711.751 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 103.36 | iteration 15110/ 292968 | consumed samples: 30945280 | consumed tokens: 14742208512 | elapsed time per iteration (ms): 120210.1 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.709717E+00 | loss scale: 262144.0 | grad norm: 123486.020 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.62 | iteration 15111/ 292968 | consumed samples: 30947328 | consumed tokens: 14744043520 | elapsed time per iteration (ms): 124238.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.698303E+00 | loss scale: 262144.0 | grad norm: 113350.654 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.42 | iteration 15112/ 292968 | consumed samples: 30949376 | consumed tokens: 14745878528 | elapsed time per iteration (ms): 124060.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.679046E+00 | loss scale: 262144.0 | grad norm: 173238.401 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.56 | iteration 15113/ 292968 | consumed samples: 30951424 | consumed tokens: 14747713536 | elapsed time per iteration (ms): 125377.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.710041E+00 | loss scale: 262144.0 | grad norm: 211265.955 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.56 | iteration 15114/ 292968 | consumed samples: 30953472 | consumed tokens: 14749548544 | elapsed time per iteration (ms): 124707.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.710313E+00 | loss scale: 262144.0 | grad norm: 117349.387 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.07 | iteration 15115/ 292968 | consumed samples: 30955520 | consumed tokens: 14751383552 | elapsed time per iteration (ms): 126407.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.717447E+00 | loss scale: 262144.0 | grad norm: 134095.334 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.79 | iteration 15116/ 292968 | consumed samples: 30957568 | consumed tokens: 14753218560 | elapsed time per iteration (ms): 123035.1 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.701715E+00 | loss scale: 262144.0 | grad norm: 172861.165 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.36 | iteration 15117/ 292968 | consumed samples: 30959616 | consumed tokens: 14755053568 | elapsed time per iteration (ms): 126506.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.681170E+00 | loss scale: 262144.0 | grad norm: 236990.439 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.71 | iteration 15118/ 292968 | consumed samples: 30961664 | consumed tokens: 14756888576 | elapsed time per iteration (ms): 125196.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.708935E+00 | loss scale: 262144.0 | grad norm: 258853.818 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.69 | iteration 15119/ 292968 | consumed samples: 30963712 | consumed tokens: 14758723584 | elapsed time per iteration (ms): 125613.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.725596E+00 | loss scale: 262144.0 | grad norm: 218070.250 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.38 | iteration 15120/ 292968 | consumed samples: 30965760 | consumed tokens: 14760558592 | elapsed time per iteration (ms): 120458.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.700513E+00 | loss scale: 262144.0 | grad norm: 207284.989 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.42 | iteration 15121/ 292968 | consumed samples: 30967808 | consumed tokens: 14762393600 | elapsed time per iteration (ms): 119226.1 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.724540E+00 | loss scale: 262144.0 | grad norm: 225999.936 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.44 | iteration 15122/ 292968 | consumed samples: 30969856 | consumed tokens: 14764228608 | elapsed time per iteration (ms): 117689.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.721982E+00 | loss scale: 262144.0 | grad norm: 196163.171 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.73 | iteration 15123/ 292968 | consumed samples: 30971904 | consumed tokens: 14766063616 | elapsed time per iteration (ms): 115622.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.682038E+00 | loss scale: 262144.0 | grad norm: 138042.391 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.54 | iteration 15124/ 292968 | consumed samples: 30973952 | consumed tokens: 14767898624 | elapsed time per iteration (ms): 117361.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.731923E+00 | loss scale: 262144.0 | grad norm: 148985.642 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.02 | iteration 15125/ 292968 | consumed samples: 30976000 | consumed tokens: 14769733632 | elapsed time per iteration (ms): 115704.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706064E+00 | loss scale: 262144.0 | grad norm: 181527.149 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.46 | iteration 15126/ 292968 | consumed samples: 30978048 | consumed tokens: 14771568640 | elapsed time per iteration (ms): 114064.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.741679E+00 | loss scale: 262144.0 | grad norm: 229554.427 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 103.94 | iteration 15127/ 292968 | consumed samples: 30980096 | consumed tokens: 14773403648 | elapsed time per iteration (ms): 115258.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.746849E+00 | loss scale: 262144.0 | grad norm: 233193.089 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.86 | iteration 15128/ 292968 | consumed samples: 30982144 | consumed tokens: 14775238656 | elapsed time per iteration (ms): 115954.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.737311E+00 | loss scale: 262144.0 | grad norm: 234736.302 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.24 | iteration 15129/ 292968 | consumed samples: 30984192 | consumed tokens: 14777073664 | elapsed time per iteration (ms): 115891.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.744499E+00 | loss scale: 262144.0 | grad norm: 256630.603 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.30 | iteration 15130/ 292968 | consumed samples: 30986240 | consumed tokens: 14778908672 | elapsed time per iteration (ms): 115468.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.734002E+00 | loss scale: 262144.0 | grad norm: 233138.565 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.67 | iteration 15131/ 292968 | consumed samples: 30988288 | consumed tokens: 14780743680 | elapsed time per iteration (ms): 115013.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.734467E+00 | loss scale: 262144.0 | grad norm: 192415.931 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 103.08 | iteration 15132/ 292968 | consumed samples: 30990336 | consumed tokens: 14782578688 | elapsed time per iteration (ms): 114928.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.692795E+00 | loss scale: 262144.0 | grad norm: 216076.188 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 103.15 | iteration 15133/ 292968 | consumed samples: 30992384 | consumed tokens: 14784413696 | elapsed time per iteration (ms): 115201.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.724862E+00 | loss scale: 262144.0 | grad norm: 169150.926 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.91 | iteration 15134/ 292968 | consumed samples: 30994432 | consumed tokens: 14786248704 | elapsed time per iteration (ms): 116516.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.738401E+00 | loss scale: 262144.0 | grad norm: 147765.874 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.75 | iteration 15135/ 292968 | consumed samples: 30996480 | consumed tokens: 14788083712 | elapsed time per iteration (ms): 115085.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.726109E+00 | loss scale: 262144.0 | grad norm: 167540.050 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 103.01 | iteration 15136/ 292968 | consumed samples: 30998528 | consumed tokens: 14789918720 | elapsed time per iteration (ms): 116876.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.716403E+00 | loss scale: 262144.0 | grad norm: 139117.149 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.44 | iteration 15137/ 292968 | consumed samples: 31000576 | consumed tokens: 14791753728 | elapsed time per iteration (ms): 116174.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706878E+00 | loss scale: 262144.0 | grad norm: 134803.070 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.05 | iteration 15138/ 292968 | consumed samples: 31002624 | consumed tokens: 14793588736 | elapsed time per iteration (ms): 116394.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.716515E+00 | loss scale: 262144.0 | grad norm: 131675.278 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.86 | iteration 15139/ 292968 | consumed samples: 31004672 | consumed tokens: 14795423744 | elapsed time per iteration (ms): 123266.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.720357E+00 | loss scale: 262144.0 | grad norm: 106526.472 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.18 | iteration 15140/ 292968 | consumed samples: 31006720 | consumed tokens: 14797258752 | elapsed time per iteration (ms): 117015.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.692018E+00 | loss scale: 262144.0 | grad norm: 105059.259 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.31 | iteration 15141/ 292968 | consumed samples: 31008768 | consumed tokens: 14799093760 | elapsed time per iteration (ms): 121353.1 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.695567E+00 | loss scale: 262144.0 | grad norm: 134498.368 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.69 | iteration 15142/ 292968 | consumed samples: 31010816 | consumed tokens: 14800928768 | elapsed time per iteration (ms): 119905.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.691179E+00 | loss scale: 262144.0 | grad norm: 136493.226 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.87 | iteration 15143/ 292968 | consumed samples: 31012864 | consumed tokens: 14802763776 | elapsed time per iteration (ms): 117198.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.701777E+00 | loss scale: 262144.0 | grad norm: 157818.363 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.16 | iteration 15144/ 292968 | consumed samples: 31014912 | consumed tokens: 14804598784 | elapsed time per iteration (ms): 119711.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.701210E+00 | loss scale: 262144.0 | grad norm: 189948.120 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.03 | iteration 15145/ 292968 | consumed samples: 31016960 | consumed tokens: 14806433792 | elapsed time per iteration (ms): 118000.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.697853E+00 | loss scale: 262144.0 | grad norm: 228729.363 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.47 | iteration 15146/ 292968 | consumed samples: 31019008 | consumed tokens: 14808268800 | elapsed time per iteration (ms): 116818.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.722884E+00 | loss scale: 262144.0 | grad norm: 250930.735 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.49 | iteration 15147/ 292968 | consumed samples: 31021056 | consumed tokens: 14810103808 | elapsed time per iteration (ms): 116507.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.700873E+00 | loss scale: 262144.0 | grad norm: 205464.037 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.76 | iteration 15148/ 292968 | consumed samples: 31023104 | consumed tokens: 14811938816 | elapsed time per iteration (ms): 118334.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.718179E+00 | loss scale: 262144.0 | grad norm: 88026.276 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.19 | iteration 15149/ 292968 | consumed samples: 31025152 | consumed tokens: 14813773824 | elapsed time per iteration (ms): 120344.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.674142E+00 | loss scale: 262144.0 | grad norm: 163995.069 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.51 | iteration 15150/ 292968 | consumed samples: 31027200 | consumed tokens: 14815608832 | elapsed time per iteration (ms): 119201.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706655E+00 | loss scale: 262144.0 | grad norm: 222004.958 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.46 | ------------------------------------------------------------------------------------------- valid loss at iteration 15150 | lm loss value: 3.386908E+00 | lm loss PPL: 2.957438E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 15150 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 00:43:14,053] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/mp_rank_01_model_states.pt [2022-01-27 00:43:14,136] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/mp_rank_00_model_states.pt [2022-01-27 00:43:28,409] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 00:43:29,627] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 00:43:29,645] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 00:43:29,648] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 00:43:29,784] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 00:43:29,808] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 00:43:29,971] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 00:43:30,072] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 00:43:35,972] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 00:43:36,385] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 00:43:36,871] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 00:43:37,288] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 00:43:37,341] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 00:43:37,558] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 00:43:37,623] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 00:43:37,668] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 00:43:37,814] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 00:43:38,046] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 00:43:38,056] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 00:43:39,422] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 00:43:39,853] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 00:43:39,892] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 00:43:39,938] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 00:43:40,006] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 00:43:40,114] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 00:43:40,185] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 00:43:40,311] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 00:43:40,560] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 00:43:41,636] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 00:43:41,698] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 00:43:41,991] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 00:43:42,002] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 00:43:42,149] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 00:43:42,222] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 00:43:42,310] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 00:43:42,295] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 00:43:42,454] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 00:43:42,484] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 00:43:42,549] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 00:43:42,608] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 00:43:42,588] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 00:43:42,777] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 00:43:42,779] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 00:43:42,808] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 00:43:42,788] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 00:43:42,892] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 00:43:43,023] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 00:43:43,119] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 00:43:43,402] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 00:43:43,584] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 00:43:43,630] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 00:43:43,628] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 00:43:43,706] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 00:43:43,750] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 00:43:43,753] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 00:43:43,913] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 00:43:43,941] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 00:43:43,972] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 00:43:44,078] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 00:43:44,073] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 00:43:44,072] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 00:43:44,176] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 00:43:44,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 00:43:44,495] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 00:43:44,544] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 00:43:44,572] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 00:43:44,578] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 00:43:44,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 00:43:44,640] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 00:43:44,649] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 00:43:44,746] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 00:43:44,826] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 00:43:44,785] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 00:43:45,041] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 00:43:45,516] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 00:43:45,727] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 00:43:45,740] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 00:43:45,773] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 00:43:45,787] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 00:43:45,899] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 00:43:45,941] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 00:43:46,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 00:43:46,055] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 00:43:46,104] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 00:43:46,116] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 00:43:46,197] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 00:43:46,783] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 00:43:46,885] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 00:43:46,889] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 00:43:47,086] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 00:43:47,238] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 00:43:47,295] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 00:43:47,344] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 00:43:47,380] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 00:43:47,408] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 00:43:47,546] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 00:43:47,514] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 00:43:47,615] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 00:43:47,588] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 00:43:47,884] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 00:43:47,992] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 00:43:48,013] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 00:43:48,045] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 00:43:48,056] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 00:43:48,167] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 00:43:48,218] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 00:43:48,222] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 00:43:48,307] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 00:43:48,412] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 00:43:48,671] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 00:43:48,971] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 00:43:49,191] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 00:43:49,243] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 00:43:49,317] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 00:43:49,352] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 00:43:49,297] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 00:43:49,476] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 00:43:49,414] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 00:43:49,765] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 00:43:49,702] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 00:43:51,121] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 00:43:51,171] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 00:43:51,956] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 00:43:52,137] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 00:43:53,700] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 00:43:53,703] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 00:43:54,846] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 00:43:55,136] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15150/zero_pp_rank_0_mp_rank_126_optim_states.pt successfully saved checkpoint at iteration 15150 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 45593.86 iteration 15151/ 292968 | consumed samples: 31029248 | consumed tokens: 14817443840 | elapsed time per iteration (ms): 563594.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.704700E+00 | loss scale: 262144.0 | grad norm: 159104.130 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.004 | TFLOPs: 21.04 | iteration 15152/ 292968 | consumed samples: 31031296 | consumed tokens: 14819278848 | elapsed time per iteration (ms): 135510.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.703635E+00 | loss scale: 262144.0 | grad norm: 117487.206 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 87.49 | iteration 15153/ 292968 | consumed samples: 31033344 | consumed tokens: 14821113856 | elapsed time per iteration (ms): 131905.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.682427E+00 | loss scale: 262144.0 | grad norm: 151107.554 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 89.88 | iteration 15154/ 292968 | consumed samples: 31035392 | consumed tokens: 14822948864 | elapsed time per iteration (ms): 128985.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.717440E+00 | loss scale: 262144.0 | grad norm: 191060.270 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.91 | iteration 15155/ 292968 | consumed samples: 31037440 | consumed tokens: 14824783872 | elapsed time per iteration (ms): 133518.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.704767E+00 | loss scale: 262144.0 | grad norm: 185134.645 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.79 | iteration 15156/ 292968 | consumed samples: 31039488 | consumed tokens: 14826618880 | elapsed time per iteration (ms): 127681.9 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706549E+00 | loss scale: 262144.0 | grad norm: 128032.436 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.85 | iteration 15157/ 292968 | consumed samples: 31041536 | consumed tokens: 14828453888 | elapsed time per iteration (ms): 125965.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.687943E+00 | loss scale: 262144.0 | grad norm: 161485.345 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.12 | iteration 15158/ 292968 | consumed samples: 31043584 | consumed tokens: 14830288896 | elapsed time per iteration (ms): 122763.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.700849E+00 | loss scale: 262144.0 | grad norm: 195481.167 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.57 | iteration 15159/ 292968 | consumed samples: 31045632 | consumed tokens: 14832123904 | elapsed time per iteration (ms): 119871.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.710926E+00 | loss scale: 262144.0 | grad norm: 179896.481 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.90 | iteration 15160/ 292968 | consumed samples: 31047680 | consumed tokens: 14833958912 | elapsed time per iteration (ms): 118795.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706759E+00 | loss scale: 262144.0 | grad norm: 202507.793 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.80 | iteration 15161/ 292968 | consumed samples: 31049728 | consumed tokens: 14835793920 | elapsed time per iteration (ms): 119121.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.688635E+00 | loss scale: 262144.0 | grad norm: 249606.420 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.52 | iteration 15162/ 292968 | consumed samples: 31051776 | consumed tokens: 14837628928 | elapsed time per iteration (ms): 118301.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.704326E+00 | loss scale: 262144.0 | grad norm: 249824.096 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.21 | iteration 15163/ 292968 | consumed samples: 31053824 | consumed tokens: 14839463936 | elapsed time per iteration (ms): 117503.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.692014E+00 | loss scale: 262144.0 | grad norm: 183586.589 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.89 | iteration 15164/ 292968 | consumed samples: 31055872 | consumed tokens: 14841298944 | elapsed time per iteration (ms): 116810.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.709059E+00 | loss scale: 262144.0 | grad norm: 255520.476 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.49 | [2022-01-27 01:15:08,985] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 262144.0 iteration 15165/ 292968 | consumed samples: 31057920 | consumed tokens: 14843133952 | elapsed time per iteration (ms): 118309.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.707430E+00 | loss scale: 262144.0 | grad norm: 255520.476 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.21 | iteration 15166/ 292968 | consumed samples: 31059968 | consumed tokens: 14844968960 | elapsed time per iteration (ms): 118459.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.675354E+00 | loss scale: 262144.0 | grad norm: 316581.564 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.08 | iteration 15167/ 292968 | consumed samples: 31062016 | consumed tokens: 14846803968 | elapsed time per iteration (ms): 120433.1 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.712146E+00 | loss scale: 262144.0 | grad norm: 112467.473 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.44 | iteration 15168/ 292968 | consumed samples: 31064064 | consumed tokens: 14848638976 | elapsed time per iteration (ms): 120083.1 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.728077E+00 | loss scale: 262144.0 | grad norm: 294575.939 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.73 | iteration 15169/ 292968 | consumed samples: 31066112 | consumed tokens: 14850473984 | elapsed time per iteration (ms): 120814.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.689011E+00 | loss scale: 262144.0 | grad norm: 234671.725 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.13 | iteration 15170/ 292968 | consumed samples: 31068160 | consumed tokens: 14852308992 | elapsed time per iteration (ms): 120029.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.697103E+00 | loss scale: 262144.0 | grad norm: 173457.257 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.77 | iteration 15171/ 292968 | consumed samples: 31070208 | consumed tokens: 14854144000 | elapsed time per iteration (ms): 118259.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.662925E+00 | loss scale: 262144.0 | grad norm: 207871.207 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.25 | iteration 15172/ 292968 | consumed samples: 31072256 | consumed tokens: 14855979008 | elapsed time per iteration (ms): 121607.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.729667E+00 | loss scale: 262144.0 | grad norm: 154024.365 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.49 | iteration 15173/ 292968 | consumed samples: 31074304 | consumed tokens: 14857814016 | elapsed time per iteration (ms): 121166.8 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.719599E+00 | loss scale: 262144.0 | grad norm: 157711.225 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.84 | iteration 15174/ 292968 | consumed samples: 31076352 | consumed tokens: 14859649024 | elapsed time per iteration (ms): 120080.6 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.709370E+00 | loss scale: 262144.0 | grad norm: 188949.681 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.73 | iteration 15175/ 292968 | consumed samples: 31078400 | consumed tokens: 14861484032 | elapsed time per iteration (ms): 121023.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.706713E+00 | loss scale: 262144.0 | grad norm: 163751.969 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.96 | iteration 15176/ 292968 | consumed samples: 31080448 | consumed tokens: 14863319040 | elapsed time per iteration (ms): 119042.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.698449E+00 | loss scale: 262144.0 | grad norm: 135016.430 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.59 | iteration 15177/ 292968 | consumed samples: 31082496 | consumed tokens: 14865154048 | elapsed time per iteration (ms): 119503.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.730410E+00 | loss scale: 262144.0 | grad norm: 152639.469 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.21 | iteration 15178/ 292968 | consumed samples: 31084544 | consumed tokens: 14866989056 | elapsed time per iteration (ms): 119924.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.716121E+00 | loss scale: 262144.0 | grad norm: 149474.590 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.86 | iteration 15179/ 292968 | consumed samples: 31086592 | consumed tokens: 14868824064 | elapsed time per iteration (ms): 121238.0 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.711665E+00 | loss scale: 262144.0 | grad norm: 165282.966 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.79 | iteration 15180/ 292968 | consumed samples: 31088640 | consumed tokens: 14870659072 | elapsed time per iteration (ms): 123732.3 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.764697E+00 | loss scale: 262144.0 | grad norm: 186331.038 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.82 | iteration 15181/ 292968 | consumed samples: 31090688 | consumed tokens: 14872494080 | elapsed time per iteration (ms): 117811.5 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.763720E+00 | loss scale: 262144.0 | grad norm: 161259.102 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.63 | iteration 15182/ 292968 | consumed samples: 31092736 | consumed tokens: 14874329088 | elapsed time per iteration (ms): 118416.7 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.741344E+00 | loss scale: 262144.0 | grad norm: 162176.860 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.12 | iteration 15183/ 292968 | consumed samples: 31094784 | consumed tokens: 14876164096 | elapsed time per iteration (ms): 118553.2 | learning rate: 5.957E-05 | global batch size: 2048 | lm loss: 2.760147E+00 | loss scale: 262144.0 | grad norm: 218477.476 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.00 | iteration 15184/ 292968 | consumed samples: 31096832 | consumed tokens: 14877999104 | elapsed time per iteration (ms): 118659.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.736692E+00 | loss scale: 262144.0 | grad norm: 256044.963 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.91 | iteration 15185/ 292968 | consumed samples: 31098880 | consumed tokens: 14879834112 | elapsed time per iteration (ms): 119054.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.741638E+00 | loss scale: 262144.0 | grad norm: 248396.594 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.58 | iteration 15186/ 292968 | consumed samples: 31100928 | consumed tokens: 14881669120 | elapsed time per iteration (ms): 125373.1 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.737504E+00 | loss scale: 262144.0 | grad norm: 166221.652 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.56 | iteration 15187/ 292968 | consumed samples: 31102976 | consumed tokens: 14883504128 | elapsed time per iteration (ms): 125321.1 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.751354E+00 | loss scale: 262144.0 | grad norm: 143905.170 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.60 | iteration 15188/ 292968 | consumed samples: 31105024 | consumed tokens: 14885339136 | elapsed time per iteration (ms): 122803.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.782909E+00 | loss scale: 262144.0 | grad norm: 191923.546 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.54 | iteration 15189/ 292968 | consumed samples: 31107072 | consumed tokens: 14887174144 | elapsed time per iteration (ms): 123737.5 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.789658E+00 | loss scale: 262144.0 | grad norm: 198844.375 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 95.81 | iteration 15190/ 292968 | consumed samples: 31109120 | consumed tokens: 14889009152 | elapsed time per iteration (ms): 121342.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 3.002264E+00 | loss scale: 262144.0 | grad norm: 430637.275 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.70 | iteration 15191/ 292968 | consumed samples: 31111168 | consumed tokens: 14890844160 | elapsed time per iteration (ms): 123078.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.846073E+00 | loss scale: 262144.0 | grad norm: 302326.525 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.32 | iteration 15192/ 292968 | consumed samples: 31113216 | consumed tokens: 14892679168 | elapsed time per iteration (ms): 119159.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.815272E+00 | loss scale: 262144.0 | grad norm: 225891.911 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.49 | iteration 15193/ 292968 | consumed samples: 31115264 | consumed tokens: 14894514176 | elapsed time per iteration (ms): 120816.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.794705E+00 | loss scale: 262144.0 | grad norm: 166552.105 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.13 | iteration 15194/ 292968 | consumed samples: 31117312 | consumed tokens: 14896349184 | elapsed time per iteration (ms): 122538.5 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.766313E+00 | loss scale: 262144.0 | grad norm: 246914.323 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.75 | iteration 15195/ 292968 | consumed samples: 31119360 | consumed tokens: 14898184192 | elapsed time per iteration (ms): 122942.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.780852E+00 | loss scale: 262144.0 | grad norm: 153923.082 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.43 | iteration 15196/ 292968 | consumed samples: 31121408 | consumed tokens: 14900019200 | elapsed time per iteration (ms): 121075.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.767556E+00 | loss scale: 262144.0 | grad norm: 164475.731 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.92 | iteration 15197/ 292968 | consumed samples: 31123456 | consumed tokens: 14901854208 | elapsed time per iteration (ms): 121506.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.724036E+00 | loss scale: 262144.0 | grad norm: 153937.108 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.57 | iteration 15198/ 292968 | consumed samples: 31125504 | consumed tokens: 14903689216 | elapsed time per iteration (ms): 122175.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.734918E+00 | loss scale: 262144.0 | grad norm: 115771.084 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.04 | iteration 15199/ 292968 | consumed samples: 31127552 | consumed tokens: 14905524224 | elapsed time per iteration (ms): 118468.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.739824E+00 | loss scale: 262144.0 | grad norm: 143715.600 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.07 | iteration 15200/ 292968 | consumed samples: 31129600 | consumed tokens: 14907359232 | elapsed time per iteration (ms): 118451.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.716005E+00 | loss scale: 262144.0 | grad norm: 113690.014 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.09 | saving checkpoint at iteration 15200 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 02:25:40,575] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/mp_rank_00_model_states.pt [2022-01-27 02:25:40,581] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/mp_rank_01_model_states.pt [2022-01-27 02:25:52,970] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 02:25:53,008] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 02:25:54,377] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 02:25:55,925] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 02:25:55,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 02:25:56,150] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 02:25:56,375] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 02:25:56,415] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 02:26:00,112] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 02:26:00,291] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 02:26:00,620] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 02:26:00,939] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 02:26:01,077] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 02:26:01,364] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 02:26:01,654] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 02:26:01,702] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 02:26:02,676] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 02:26:03,239] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 02:26:03,704] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 02:26:03,749] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 02:26:04,398] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 02:26:04,768] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 02:26:04,792] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 02:26:05,696] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 02:26:06,010] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 02:26:06,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 02:26:06,353] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 02:26:06,678] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 02:26:07,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 02:26:07,356] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 02:26:07,421] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 02:26:07,511] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 02:26:07,566] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 02:26:07,596] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 02:26:07,811] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 02:26:07,980] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 02:26:07,990] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 02:26:07,992] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 02:26:08,071] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 02:26:08,227] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 02:26:08,325] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 02:26:08,393] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 02:26:08,396] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 02:26:08,695] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 02:26:08,820] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 02:26:08,983] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 02:26:09,254] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 02:26:09,273] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 02:26:09,346] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 02:26:09,444] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 02:26:09,447] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 02:26:09,583] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 02:26:09,605] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 02:26:09,720] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 02:26:09,758] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 02:26:09,830] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 02:26:09,767] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 02:26:09,948] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 02:26:10,439] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 02:26:10,510] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 02:26:10,737] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 02:26:10,851] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 02:26:10,848] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 02:26:11,249] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 02:26:11,280] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 02:26:11,288] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 02:26:11,321] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 02:26:11,386] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 02:26:11,545] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 02:26:11,597] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 02:26:11,601] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 02:26:11,654] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 02:26:11,623] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 02:26:11,800] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 02:26:11,825] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 02:26:12,136] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 02:26:12,485] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 02:26:12,577] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 02:26:12,715] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 02:26:12,756] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 02:26:12,861] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 02:26:13,008] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 02:26:13,148] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 02:26:13,142] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 02:26:13,224] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 02:26:13,260] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 02:26:13,340] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 02:26:13,354] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 02:26:13,377] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 02:26:13,386] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 02:26:13,546] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 02:26:13,848] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 02:26:13,990] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 02:26:14,102] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 02:26:14,217] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 02:26:14,499] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 02:26:14,394] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 02:26:14,755] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 02:26:14,788] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 02:26:15,489] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 02:26:15,527] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 02:26:15,813] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 02:26:15,980] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 02:26:16,475] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 02:26:16,493] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 02:26:16,560] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 02:26:16,579] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 02:26:16,670] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 02:26:16,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 02:26:16,806] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 02:26:16,818] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 02:26:17,053] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 02:26:17,785] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 02:26:17,874] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 02:26:18,384] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 02:26:18,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 02:26:18,960] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 02:26:19,030] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 02:26:19,324] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 02:26:19,354] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 02:26:20,347] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 02:26:20,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 02:26:22,085] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 02:26:22,131] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 02:26:22,149] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 02:26:22,601] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 02:26:25,571] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 02:26:25,665] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15200/zero_pp_rank_0_mp_rank_126_optim_states.pt successfully saved checkpoint at iteration 15200 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 49990.92 iteration 15201/ 292968 | consumed samples: 31131648 | consumed tokens: 14909194240 | elapsed time per iteration (ms): 167082.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.728362E+00 | loss scale: 262144.0 | grad norm: 97042.624 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 70.96 | iteration 15202/ 292968 | consumed samples: 31133696 | consumed tokens: 14911029248 | elapsed time per iteration (ms): 118914.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.730235E+00 | loss scale: 262144.0 | grad norm: 120237.420 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.70 | iteration 15203/ 292968 | consumed samples: 31135744 | consumed tokens: 14912864256 | elapsed time per iteration (ms): 117572.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.724926E+00 | loss scale: 262144.0 | grad norm: 128068.916 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.83 | iteration 15204/ 292968 | consumed samples: 31137792 | consumed tokens: 14914699264 | elapsed time per iteration (ms): 119681.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.711804E+00 | loss scale: 262144.0 | grad norm: 144865.925 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.06 | iteration 15205/ 292968 | consumed samples: 31139840 | consumed tokens: 14916534272 | elapsed time per iteration (ms): 117480.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.728270E+00 | loss scale: 262144.0 | grad norm: 143087.118 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.91 | iteration 15206/ 292968 | consumed samples: 31141888 | consumed tokens: 14918369280 | elapsed time per iteration (ms): 116188.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.723508E+00 | loss scale: 262144.0 | grad norm: 110297.895 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.04 | iteration 15207/ 292968 | consumed samples: 31143936 | consumed tokens: 14920204288 | elapsed time per iteration (ms): 116722.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.740158E+00 | loss scale: 262144.0 | grad norm: 251018.265 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.57 | [2022-01-27 02:42:05,992] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 iteration 15208/ 292968 | consumed samples: 31145984 | consumed tokens: 14922039296 | elapsed time per iteration (ms): 116678.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 7.287151E+00 | loss scale: 131072.0 | grad norm: 251018.265 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.61 | [2022-01-27 02:44:03,669] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0 iteration 15209/ 292968 | consumed samples: 31148032 | consumed tokens: 14923874304 | elapsed time per iteration (ms): 117677.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 7.295066E+00 | loss scale: 65536.0 | grad norm: 251018.265 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.75 | [2022-01-27 02:46:00,896] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 iteration 15210/ 292968 | consumed samples: 31150080 | consumed tokens: 14925709312 | elapsed time per iteration (ms): 117227.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 7.304599E+00 | loss scale: 32768.0 | grad norm: 251018.265 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.13 | iteration 15211/ 292968 | consumed samples: 31152128 | consumed tokens: 14927544320 | elapsed time per iteration (ms): 119512.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 7.293129E+00 | loss scale: 32768.0 | grad norm: 735906.394 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.20 | iteration 15212/ 292968 | consumed samples: 31154176 | consumed tokens: 14929379328 | elapsed time per iteration (ms): 118419.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 4.953250E+00 | loss scale: 32768.0 | grad norm: 895966.940 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.11 | iteration 15213/ 292968 | consumed samples: 31156224 | consumed tokens: 14931214336 | elapsed time per iteration (ms): 119200.5 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.905317E+00 | loss scale: 32768.0 | grad norm: 93914.384 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.46 | iteration 15214/ 292968 | consumed samples: 31158272 | consumed tokens: 14933049344 | elapsed time per iteration (ms): 119094.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.826337E+00 | loss scale: 32768.0 | grad norm: 24072.289 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.55 | iteration 15215/ 292968 | consumed samples: 31160320 | consumed tokens: 14934884352 | elapsed time per iteration (ms): 118654.7 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.811136E+00 | loss scale: 32768.0 | grad norm: 42716.228 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.92 | iteration 15216/ 292968 | consumed samples: 31162368 | consumed tokens: 14936719360 | elapsed time per iteration (ms): 121483.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.776839E+00 | loss scale: 32768.0 | grad norm: 30106.266 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.59 | iteration 15217/ 292968 | consumed samples: 31164416 | consumed tokens: 14938554368 | elapsed time per iteration (ms): 118214.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.752849E+00 | loss scale: 32768.0 | grad norm: 30429.960 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.29 | iteration 15218/ 292968 | consumed samples: 31166464 | consumed tokens: 14940389376 | elapsed time per iteration (ms): 118983.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.760829E+00 | loss scale: 32768.0 | grad norm: 29530.161 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.64 | iteration 15219/ 292968 | consumed samples: 31168512 | consumed tokens: 14942224384 | elapsed time per iteration (ms): 119743.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.746614E+00 | loss scale: 32768.0 | grad norm: 26697.119 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.01 | iteration 15220/ 292968 | consumed samples: 31170560 | consumed tokens: 14944059392 | elapsed time per iteration (ms): 117741.7 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.756798E+00 | loss scale: 32768.0 | grad norm: 28578.405 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.69 | iteration 15221/ 292968 | consumed samples: 31172608 | consumed tokens: 14945894400 | elapsed time per iteration (ms): 117012.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.768408E+00 | loss scale: 32768.0 | grad norm: 49178.447 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 101.32 | iteration 15222/ 292968 | consumed samples: 31174656 | consumed tokens: 14947729408 | elapsed time per iteration (ms): 115527.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.757118E+00 | loss scale: 32768.0 | grad norm: 22801.751 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.018 | TFLOPs: 102.62 | iteration 15223/ 292968 | consumed samples: 31176704 | consumed tokens: 14949564416 | elapsed time per iteration (ms): 117791.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.767524E+00 | loss scale: 32768.0 | grad norm: 34386.276 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.65 | iteration 15224/ 292968 | consumed samples: 31178752 | consumed tokens: 14951399424 | elapsed time per iteration (ms): 117475.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.733172E+00 | loss scale: 32768.0 | grad norm: 23304.860 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.92 | iteration 15225/ 292968 | consumed samples: 31180800 | consumed tokens: 14953234432 | elapsed time per iteration (ms): 118249.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.732929E+00 | loss scale: 32768.0 | grad norm: 27528.347 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.26 | iteration 15226/ 292968 | consumed samples: 31182848 | consumed tokens: 14955069440 | elapsed time per iteration (ms): 117119.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 3.895649E+00 | loss scale: 32768.0 | grad norm: 1108306.999 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.23 | iteration 15227/ 292968 | consumed samples: 31184896 | consumed tokens: 14956904448 | elapsed time per iteration (ms): 120012.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.831290E+00 | loss scale: 32768.0 | grad norm: 61406.568 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.79 | iteration 15228/ 292968 | consumed samples: 31186944 | consumed tokens: 14958739456 | elapsed time per iteration (ms): 121488.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.774511E+00 | loss scale: 32768.0 | grad norm: 21519.784 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.58 | iteration 15229/ 292968 | consumed samples: 31188992 | consumed tokens: 14960574464 | elapsed time per iteration (ms): 121825.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.819885E+00 | loss scale: 32768.0 | grad norm: 64907.817 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.31 | iteration 15230/ 292968 | consumed samples: 31191040 | consumed tokens: 14962409472 | elapsed time per iteration (ms): 121386.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.799104E+00 | loss scale: 32768.0 | grad norm: 34739.032 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.67 | iteration 15231/ 292968 | consumed samples: 31193088 | consumed tokens: 14964244480 | elapsed time per iteration (ms): 120768.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.775885E+00 | loss scale: 32768.0 | grad norm: 50023.669 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.17 | iteration 15232/ 292968 | consumed samples: 31195136 | consumed tokens: 14966079488 | elapsed time per iteration (ms): 118656.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.767809E+00 | loss scale: 32768.0 | grad norm: 44245.307 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.91 | iteration 15233/ 292968 | consumed samples: 31197184 | consumed tokens: 14967914496 | elapsed time per iteration (ms): 118559.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.766224E+00 | loss scale: 32768.0 | grad norm: 36145.527 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.00 | iteration 15234/ 292968 | consumed samples: 31199232 | consumed tokens: 14969749504 | elapsed time per iteration (ms): 118323.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.770447E+00 | loss scale: 32768.0 | grad norm: 44315.590 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.19 | iteration 15235/ 292968 | consumed samples: 31201280 | consumed tokens: 14971584512 | elapsed time per iteration (ms): 118484.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.745464E+00 | loss scale: 32768.0 | grad norm: 19534.073 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.06 | iteration 15236/ 292968 | consumed samples: 31203328 | consumed tokens: 14973419520 | elapsed time per iteration (ms): 117891.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.757447E+00 | loss scale: 32768.0 | grad norm: 44108.073 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.56 | iteration 15237/ 292968 | consumed samples: 31205376 | consumed tokens: 14975254528 | elapsed time per iteration (ms): 119716.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.738452E+00 | loss scale: 32768.0 | grad norm: 23260.303 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.03 | iteration 15238/ 292968 | consumed samples: 31207424 | consumed tokens: 14977089536 | elapsed time per iteration (ms): 117118.1 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.743088E+00 | loss scale: 32768.0 | grad norm: 30230.467 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.23 | iteration 15239/ 292968 | consumed samples: 31209472 | consumed tokens: 14978924544 | elapsed time per iteration (ms): 118528.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.747015E+00 | loss scale: 32768.0 | grad norm: 27546.060 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.02 | iteration 15240/ 292968 | consumed samples: 31211520 | consumed tokens: 14980759552 | elapsed time per iteration (ms): 139734.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.739872E+00 | loss scale: 32768.0 | grad norm: 21386.281 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 84.84 | iteration 15241/ 292968 | consumed samples: 31213568 | consumed tokens: 14982594560 | elapsed time per iteration (ms): 119012.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.723057E+00 | loss scale: 32768.0 | grad norm: 18627.763 | num zeros: 0.0 | curriculum seqlen: 896 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.62 | iteration 15242/ 292968 | consumed samples: 31215616 | consumed tokens: 14984445952 | elapsed time per iteration (ms): 121806.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.745260E+00 | loss scale: 32768.0 | grad norm: 24592.210 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.20 | iteration 15243/ 292968 | consumed samples: 31217664 | consumed tokens: 14986297344 | elapsed time per iteration (ms): 121506.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.729257E+00 | loss scale: 32768.0 | grad norm: 26045.623 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.44 | iteration 15244/ 292968 | consumed samples: 31219712 | consumed tokens: 14988148736 | elapsed time per iteration (ms): 127178.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.744259E+00 | loss scale: 32768.0 | grad norm: 23916.988 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.05 | iteration 15245/ 292968 | consumed samples: 31221760 | consumed tokens: 14990000128 | elapsed time per iteration (ms): 123659.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.727348E+00 | loss scale: 32768.0 | grad norm: 22607.585 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.73 | iteration 15246/ 292968 | consumed samples: 31223808 | consumed tokens: 14991851520 | elapsed time per iteration (ms): 126021.5 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.712085E+00 | loss scale: 32768.0 | grad norm: 18256.245 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.91 | iteration 15247/ 292968 | consumed samples: 31225856 | consumed tokens: 14993702912 | elapsed time per iteration (ms): 124198.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.730697E+00 | loss scale: 32768.0 | grad norm: 15875.459 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.31 | iteration 15248/ 292968 | consumed samples: 31227904 | consumed tokens: 14995554304 | elapsed time per iteration (ms): 122380.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.728460E+00 | loss scale: 32768.0 | grad norm: 22945.779 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.74 | iteration 15249/ 292968 | consumed samples: 31229952 | consumed tokens: 14997405696 | elapsed time per iteration (ms): 124162.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.722868E+00 | loss scale: 32768.0 | grad norm: 29103.242 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.34 | iteration 15250/ 292968 | consumed samples: 31232000 | consumed tokens: 14999257088 | elapsed time per iteration (ms): 122377.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.737449E+00 | loss scale: 32768.0 | grad norm: 30310.561 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.74 | saving checkpoint at iteration 15250 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 04:06:24,408] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/mp_rank_00_model_states.pt [2022-01-27 04:06:24,412] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/mp_rank_01_model_states.pt [2022-01-27 04:06:38,000] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 04:06:38,087] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 04:06:40,037] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 04:06:40,205] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 04:06:40,670] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 04:06:40,859] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 04:06:40,871] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 04:06:40,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 04:06:46,900] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 04:06:47,097] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 04:06:47,119] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 04:06:47,122] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 04:06:47,421] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 04:06:47,699] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 04:06:48,195] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 04:06:48,443] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 04:06:48,548] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 04:06:48,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 04:06:49,304] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 04:06:49,333] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 04:06:49,354] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 04:06:49,566] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 04:06:49,950] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 04:06:50,013] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 04:06:50,119] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 04:06:50,729] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 04:06:50,841] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 04:06:50,887] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 04:06:50,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 04:06:51,369] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 04:06:51,770] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 04:06:51,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 04:06:52,167] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 04:06:52,182] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 04:06:52,191] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 04:06:52,201] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 04:06:52,225] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 04:06:52,553] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 04:06:52,860] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 04:06:53,722] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 04:06:53,825] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 04:06:54,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 04:06:54,666] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 04:06:54,951] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 04:06:55,013] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 04:06:55,037] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 04:06:55,129] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 04:06:55,265] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 04:06:55,266] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 04:06:55,324] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 04:06:55,367] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 04:06:55,467] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 04:06:55,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 04:06:55,488] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 04:06:55,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 04:06:55,675] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 04:06:55,859] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 04:06:55,952] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 04:06:56,025] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 04:06:56,254] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 04:06:56,324] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 04:06:56,345] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 04:06:56,559] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 04:06:56,380] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 04:06:56,573] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 04:06:56,918] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 04:06:56,983] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 04:06:57,037] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 04:06:57,090] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 04:06:57,204] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 04:06:57,185] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 04:06:57,198] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 04:06:57,228] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 04:06:57,356] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 04:06:57,372] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 04:06:57,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 04:06:57,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 04:06:57,762] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 04:06:57,868] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 04:06:58,156] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 04:06:58,245] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 04:06:58,372] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 04:06:58,671] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 04:06:58,482] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 04:06:58,806] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 04:06:58,837] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 04:06:58,900] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 04:06:59,014] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 04:06:59,098] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 04:06:59,142] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 04:06:59,411] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 04:06:59,485] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 04:07:00,070] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 04:07:00,540] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 04:07:00,592] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 04:07:00,469] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 04:07:01,378] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 04:07:01,524] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 04:07:02,021] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 04:07:02,279] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 04:07:02,417] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 04:07:02,512] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 04:07:02,759] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 04:07:03,122] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 04:07:03,167] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 04:07:03,191] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 04:07:03,315] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 04:07:03,344] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 04:07:03,154] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 04:07:03,495] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 04:07:03,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 04:07:03,712] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 04:07:04,008] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 04:07:04,057] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 04:07:04,319] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 04:07:04,340] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 04:07:04,621] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 04:07:04,644] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 04:07:04,655] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 04:07:05,045] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 04:07:05,221] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 04:07:05,505] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 04:07:05,520] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 04:07:05,583] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 04:07:06,842] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 04:07:06,888] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 04:07:06,975] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 04:07:07,069] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15250/zero_pp_rank_0_mp_rank_26_optim_states.pt successfully saved checkpoint at iteration 15250 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 47148.76 iteration 15251/ 292968 | consumed samples: 31234048 | consumed tokens: 15001108480 | elapsed time per iteration (ms): 177875.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.725391E+00 | loss scale: 32768.0 | grad norm: 30718.076 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 67.25 | iteration 15252/ 292968 | consumed samples: 31236096 | consumed tokens: 15002959872 | elapsed time per iteration (ms): 123769.7 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.735042E+00 | loss scale: 32768.0 | grad norm: 28797.319 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.64 | iteration 15253/ 292968 | consumed samples: 31238144 | consumed tokens: 15004811264 | elapsed time per iteration (ms): 124360.7 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.694785E+00 | loss scale: 32768.0 | grad norm: 20740.661 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.18 | iteration 15254/ 292968 | consumed samples: 31240192 | consumed tokens: 15006662656 | elapsed time per iteration (ms): 128313.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.721210E+00 | loss scale: 32768.0 | grad norm: 16326.906 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.22 | iteration 15255/ 292968 | consumed samples: 31242240 | consumed tokens: 15008514048 | elapsed time per iteration (ms): 126999.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.730946E+00 | loss scale: 32768.0 | grad norm: 26668.416 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.18 | iteration 15256/ 292968 | consumed samples: 31244288 | consumed tokens: 15010365440 | elapsed time per iteration (ms): 126468.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.720527E+00 | loss scale: 32768.0 | grad norm: 28050.750 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.58 | iteration 15257/ 292968 | consumed samples: 31246336 | consumed tokens: 15012216832 | elapsed time per iteration (ms): 126839.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.728094E+00 | loss scale: 32768.0 | grad norm: 24500.068 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.30 | iteration 15258/ 292968 | consumed samples: 31248384 | consumed tokens: 15014068224 | elapsed time per iteration (ms): 125854.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.699615E+00 | loss scale: 32768.0 | grad norm: 25694.272 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.04 | iteration 15259/ 292968 | consumed samples: 31250432 | consumed tokens: 15015919616 | elapsed time per iteration (ms): 124495.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.721319E+00 | loss scale: 32768.0 | grad norm: 23285.676 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.08 | iteration 15260/ 292968 | consumed samples: 31252480 | consumed tokens: 15017771008 | elapsed time per iteration (ms): 124129.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.733541E+00 | loss scale: 32768.0 | grad norm: 21844.244 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.36 | iteration 15261/ 292968 | consumed samples: 31254528 | consumed tokens: 15019622400 | elapsed time per iteration (ms): 124819.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.698672E+00 | loss scale: 32768.0 | grad norm: 24631.677 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.83 | iteration 15262/ 292968 | consumed samples: 31256576 | consumed tokens: 15021473792 | elapsed time per iteration (ms): 123672.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.715655E+00 | loss scale: 32768.0 | grad norm: 19909.099 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.72 | iteration 15263/ 292968 | consumed samples: 31258624 | consumed tokens: 15023325184 | elapsed time per iteration (ms): 122448.2 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.718589E+00 | loss scale: 32768.0 | grad norm: 26416.261 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.68 | iteration 15264/ 292968 | consumed samples: 31260672 | consumed tokens: 15025176576 | elapsed time per iteration (ms): 121888.7 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.736506E+00 | loss scale: 32768.0 | grad norm: 33406.565 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.13 | iteration 15265/ 292968 | consumed samples: 31262720 | consumed tokens: 15027027968 | elapsed time per iteration (ms): 121646.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.735388E+00 | loss scale: 32768.0 | grad norm: 17296.445 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.33 | iteration 15266/ 292968 | consumed samples: 31264768 | consumed tokens: 15028879360 | elapsed time per iteration (ms): 121591.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.726894E+00 | loss scale: 32768.0 | grad norm: 25727.653 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.37 | iteration 15267/ 292968 | consumed samples: 31266816 | consumed tokens: 15030730752 | elapsed time per iteration (ms): 123626.9 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.736690E+00 | loss scale: 32768.0 | grad norm: 34171.297 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.75 | iteration 15268/ 292968 | consumed samples: 31268864 | consumed tokens: 15032582144 | elapsed time per iteration (ms): 123848.1 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.750762E+00 | loss scale: 32768.0 | grad norm: 27050.392 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.58 | iteration 15269/ 292968 | consumed samples: 31270912 | consumed tokens: 15034433536 | elapsed time per iteration (ms): 123040.8 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.726560E+00 | loss scale: 32768.0 | grad norm: 30372.164 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.21 | iteration 15270/ 292968 | consumed samples: 31272960 | consumed tokens: 15036284928 | elapsed time per iteration (ms): 124022.0 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.730447E+00 | loss scale: 32768.0 | grad norm: 27850.116 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.44 | iteration 15271/ 292968 | consumed samples: 31275008 | consumed tokens: 15038136320 | elapsed time per iteration (ms): 122517.4 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.697403E+00 | loss scale: 32768.0 | grad norm: 29982.162 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.63 | iteration 15272/ 292968 | consumed samples: 31277056 | consumed tokens: 15039987712 | elapsed time per iteration (ms): 122787.5 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.728583E+00 | loss scale: 32768.0 | grad norm: 29738.610 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.41 | iteration 15273/ 292968 | consumed samples: 31279104 | consumed tokens: 15041839104 | elapsed time per iteration (ms): 123012.7 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.717346E+00 | loss scale: 32768.0 | grad norm: 20754.041 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.24 | iteration 15274/ 292968 | consumed samples: 31281152 | consumed tokens: 15043690496 | elapsed time per iteration (ms): 122172.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.727647E+00 | loss scale: 32768.0 | grad norm: 30980.575 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.90 | iteration 15275/ 292968 | consumed samples: 31283200 | consumed tokens: 15045541888 | elapsed time per iteration (ms): 123552.6 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.738180E+00 | loss scale: 32768.0 | grad norm: 31644.249 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.81 | iteration 15276/ 292968 | consumed samples: 31285248 | consumed tokens: 15047393280 | elapsed time per iteration (ms): 122539.3 | learning rate: 5.956E-05 | global batch size: 2048 | lm loss: 2.757657E+00 | loss scale: 32768.0 | grad norm: 26660.031 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.61 | iteration 15277/ 292968 | consumed samples: 31287296 | consumed tokens: 15049244672 | elapsed time per iteration (ms): 121938.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.740378E+00 | loss scale: 32768.0 | grad norm: 26394.136 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.09 | iteration 15278/ 292968 | consumed samples: 31289344 | consumed tokens: 15051096064 | elapsed time per iteration (ms): 122461.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.727187E+00 | loss scale: 32768.0 | grad norm: 23720.969 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.67 | iteration 15279/ 292968 | consumed samples: 31291392 | consumed tokens: 15052947456 | elapsed time per iteration (ms): 120877.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.733501E+00 | loss scale: 32768.0 | grad norm: 20814.797 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.95 | iteration 15280/ 292968 | consumed samples: 31293440 | consumed tokens: 15054798848 | elapsed time per iteration (ms): 123075.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.708609E+00 | loss scale: 32768.0 | grad norm: 17162.672 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.19 | iteration 15281/ 292968 | consumed samples: 31295488 | consumed tokens: 15056650240 | elapsed time per iteration (ms): 120496.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.731350E+00 | loss scale: 32768.0 | grad norm: 17284.007 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.27 | iteration 15282/ 292968 | consumed samples: 31297536 | consumed tokens: 15058501632 | elapsed time per iteration (ms): 121587.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.733435E+00 | loss scale: 32768.0 | grad norm: 18848.892 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.38 | iteration 15283/ 292968 | consumed samples: 31299584 | consumed tokens: 15060353024 | elapsed time per iteration (ms): 121820.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.748549E+00 | loss scale: 32768.0 | grad norm: 18698.437 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.19 | iteration 15284/ 292968 | consumed samples: 31301632 | consumed tokens: 15062204416 | elapsed time per iteration (ms): 122035.4 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.750732E+00 | loss scale: 32768.0 | grad norm: 22243.959 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.01 | iteration 15285/ 292968 | consumed samples: 31303680 | consumed tokens: 15064055808 | elapsed time per iteration (ms): 121816.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.744993E+00 | loss scale: 32768.0 | grad norm: 22645.792 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.19 | iteration 15286/ 292968 | consumed samples: 31305728 | consumed tokens: 15065907200 | elapsed time per iteration (ms): 122059.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.731462E+00 | loss scale: 32768.0 | grad norm: 26550.447 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.00 | iteration 15287/ 292968 | consumed samples: 31307776 | consumed tokens: 15067758592 | elapsed time per iteration (ms): 121004.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.734996E+00 | loss scale: 32768.0 | grad norm: 28774.841 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.85 | iteration 15288/ 292968 | consumed samples: 31309824 | consumed tokens: 15069609984 | elapsed time per iteration (ms): 123337.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.712829E+00 | loss scale: 32768.0 | grad norm: 28966.528 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.98 | iteration 15289/ 292968 | consumed samples: 31311872 | consumed tokens: 15071461376 | elapsed time per iteration (ms): 122976.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.744341E+00 | loss scale: 32768.0 | grad norm: 18334.897 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.26 | iteration 15290/ 292968 | consumed samples: 31313920 | consumed tokens: 15073312768 | elapsed time per iteration (ms): 122119.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.717233E+00 | loss scale: 32768.0 | grad norm: 16240.774 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.95 | iteration 15291/ 292968 | consumed samples: 31315968 | consumed tokens: 15075164160 | elapsed time per iteration (ms): 121424.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.732023E+00 | loss scale: 32768.0 | grad norm: 20872.854 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.51 | iteration 15292/ 292968 | consumed samples: 31318016 | consumed tokens: 15077015552 | elapsed time per iteration (ms): 122848.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.725895E+00 | loss scale: 32768.0 | grad norm: 27458.682 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.37 | iteration 15293/ 292968 | consumed samples: 31320064 | consumed tokens: 15078866944 | elapsed time per iteration (ms): 121762.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.745987E+00 | loss scale: 32768.0 | grad norm: 26737.164 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.23 | iteration 15294/ 292968 | consumed samples: 31322112 | consumed tokens: 15080718336 | elapsed time per iteration (ms): 124507.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.702706E+00 | loss scale: 32768.0 | grad norm: 26173.161 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.07 | iteration 15295/ 292968 | consumed samples: 31324160 | consumed tokens: 15082569728 | elapsed time per iteration (ms): 122049.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.735174E+00 | loss scale: 32768.0 | grad norm: 27678.667 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.00 | iteration 15296/ 292968 | consumed samples: 31326208 | consumed tokens: 15084421120 | elapsed time per iteration (ms): 122967.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.748112E+00 | loss scale: 32768.0 | grad norm: 27294.868 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.27 | iteration 15297/ 292968 | consumed samples: 31328256 | consumed tokens: 15086272512 | elapsed time per iteration (ms): 122940.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.720039E+00 | loss scale: 32768.0 | grad norm: 23504.147 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.29 | iteration 15298/ 292968 | consumed samples: 31330304 | consumed tokens: 15088123904 | elapsed time per iteration (ms): 122490.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.719100E+00 | loss scale: 32768.0 | grad norm: 24724.855 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.65 | iteration 15299/ 292968 | consumed samples: 31332352 | consumed tokens: 15089975296 | elapsed time per iteration (ms): 125293.5 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.740367E+00 | loss scale: 32768.0 | grad norm: 24949.120 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.47 | iteration 15300/ 292968 | consumed samples: 31334400 | consumed tokens: 15091826688 | elapsed time per iteration (ms): 123463.4 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.746138E+00 | loss scale: 32768.0 | grad norm: 24105.745 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.88 | ------------------------------------------------------------------------------------------- valid loss at iteration 15300 | lm loss value: 3.367565E+00 | lm loss PPL: 2.900780E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 15300 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 05:56:15,689] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/mp_rank_01_model_states.pt [2022-01-27 05:56:15,853] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/mp_rank_00_model_states.pt [2022-01-27 05:56:27,174] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 05:56:27,658] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 05:56:29,264] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 05:56:29,553] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 05:56:29,568] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 05:56:29,860] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 05:56:29,920] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 05:56:29,921] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 05:56:33,621] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 05:56:35,782] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 05:56:38,733] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 05:56:39,349] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 05:56:39,911] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 05:56:40,018] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 05:56:40,088] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 05:56:40,064] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 05:56:40,690] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 05:56:40,731] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 05:56:40,850] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 05:56:40,927] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 05:56:41,046] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 05:56:41,886] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 05:56:42,148] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 05:56:42,182] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 05:56:42,189] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 05:56:42,275] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 05:56:42,429] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 05:56:42,565] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 05:56:42,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 05:56:42,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 05:56:43,055] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 05:56:43,488] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 05:56:43,596] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 05:56:43,638] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 05:56:43,788] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 05:56:43,871] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 05:56:43,868] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 05:56:43,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 05:56:43,751] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 05:56:43,798] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 05:56:44,056] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 05:56:44,216] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 05:56:44,249] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 05:56:44,307] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 05:56:44,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 05:56:44,801] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 05:56:44,802] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 05:56:44,835] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 05:56:44,860] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 05:56:44,984] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 05:56:45,064] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 05:56:45,179] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 05:56:45,202] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 05:56:45,331] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 05:56:45,392] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 05:56:45,787] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 05:56:46,006] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 05:56:46,034] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 05:56:46,480] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 05:56:46,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 05:56:46,550] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 05:56:46,554] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 05:56:46,740] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 05:56:46,779] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 05:56:46,779] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 05:56:47,401] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 05:56:47,534] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 05:56:47,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 05:56:47,657] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 05:56:47,728] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 05:56:47,999] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 05:56:48,014] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 05:56:48,063] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 05:56:48,071] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 05:56:48,075] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 05:56:48,120] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 05:56:48,192] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 05:56:48,288] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 05:56:48,325] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 05:56:48,775] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 05:56:48,791] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 05:56:49,186] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 05:56:49,355] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 05:56:49,446] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 05:56:49,403] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 05:56:49,363] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 05:56:49,573] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 05:56:49,714] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 05:56:50,424] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 05:56:50,658] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 05:56:50,795] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 05:56:50,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 05:56:50,980] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 05:56:51,007] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 05:56:50,986] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 05:56:51,038] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 05:56:51,061] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 05:56:51,057] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 05:56:51,927] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 05:56:51,929] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 05:56:52,366] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 05:56:52,367] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 05:56:52,414] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 05:56:52,415] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 05:56:52,542] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 05:56:52,563] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 05:56:52,672] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 05:56:52,923] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 05:56:52,989] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 05:56:52,850] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 05:56:52,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 05:56:53,474] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 05:56:53,710] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 05:56:53,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 05:56:54,577] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 05:56:54,695] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 05:56:54,918] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 05:56:54,941] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 05:56:55,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 05:56:55,828] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 05:56:55,920] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 05:56:56,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 05:56:57,250] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 05:56:57,514] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 05:56:59,679] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 05:56:59,841] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 05:57:00,299] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 05:57:00,480] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15300/zero_pp_rank_0_mp_rank_127_optim_states.pt successfully saved checkpoint at iteration 15300 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 49431.29 iteration 15301/ 292968 | consumed samples: 31336448 | consumed tokens: 15093678080 | elapsed time per iteration (ms): 559172.5 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.738150E+00 | loss scale: 32768.0 | grad norm: 19173.925 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.004 | TFLOPs: 21.39 | iteration 15302/ 292968 | consumed samples: 31338496 | consumed tokens: 15095529472 | elapsed time per iteration (ms): 129204.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.730650E+00 | loss scale: 32768.0 | grad norm: 19459.982 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.58 | iteration 15303/ 292968 | consumed samples: 31340544 | consumed tokens: 15097380864 | elapsed time per iteration (ms): 127702.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.730108E+00 | loss scale: 32768.0 | grad norm: 19893.187 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.67 | iteration 15304/ 292968 | consumed samples: 31342592 | consumed tokens: 15099232256 | elapsed time per iteration (ms): 126468.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.752033E+00 | loss scale: 32768.0 | grad norm: 19185.022 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.58 | iteration 15305/ 292968 | consumed samples: 31344640 | consumed tokens: 15101083648 | elapsed time per iteration (ms): 125850.3 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.743696E+00 | loss scale: 32768.0 | grad norm: 21619.134 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.04 | iteration 15306/ 292968 | consumed samples: 31346688 | consumed tokens: 15102935040 | elapsed time per iteration (ms): 122632.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.732243E+00 | loss scale: 32768.0 | grad norm: 30200.292 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.54 | iteration 15307/ 292968 | consumed samples: 31348736 | consumed tokens: 15104786432 | elapsed time per iteration (ms): 125203.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.742765E+00 | loss scale: 32768.0 | grad norm: 31553.346 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.53 | iteration 15308/ 292968 | consumed samples: 31350784 | consumed tokens: 15106637824 | elapsed time per iteration (ms): 125115.3 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.708600E+00 | loss scale: 32768.0 | grad norm: 31209.057 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.60 | iteration 15309/ 292968 | consumed samples: 31352832 | consumed tokens: 15108489216 | elapsed time per iteration (ms): 124030.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.748106E+00 | loss scale: 32768.0 | grad norm: 25384.466 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.44 | iteration 15310/ 292968 | consumed samples: 31354880 | consumed tokens: 15110340608 | elapsed time per iteration (ms): 122385.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.728075E+00 | loss scale: 32768.0 | grad norm: 24286.365 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.73 | iteration 15311/ 292968 | consumed samples: 31356928 | consumed tokens: 15112192000 | elapsed time per iteration (ms): 123493.3 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.742726E+00 | loss scale: 32768.0 | grad norm: 21903.094 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.86 | iteration 15312/ 292968 | consumed samples: 31358976 | consumed tokens: 15114043392 | elapsed time per iteration (ms): 123116.5 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.704327E+00 | loss scale: 32768.0 | grad norm: 19834.016 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.15 | iteration 15313/ 292968 | consumed samples: 31361024 | consumed tokens: 15115894784 | elapsed time per iteration (ms): 122883.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.728213E+00 | loss scale: 32768.0 | grad norm: 26047.745 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.34 | iteration 15314/ 292968 | consumed samples: 31363072 | consumed tokens: 15117746176 | elapsed time per iteration (ms): 124477.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.733996E+00 | loss scale: 32768.0 | grad norm: 22353.504 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.09 | iteration 15315/ 292968 | consumed samples: 31365120 | consumed tokens: 15119597568 | elapsed time per iteration (ms): 122724.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.738233E+00 | loss scale: 32768.0 | grad norm: 16673.113 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.46 | iteration 15316/ 292968 | consumed samples: 31367168 | consumed tokens: 15121448960 | elapsed time per iteration (ms): 121682.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.708567E+00 | loss scale: 32768.0 | grad norm: 19324.921 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.30 | iteration 15317/ 292968 | consumed samples: 31369216 | consumed tokens: 15123300352 | elapsed time per iteration (ms): 122712.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.698758E+00 | loss scale: 32768.0 | grad norm: 19359.803 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.47 | iteration 15318/ 292968 | consumed samples: 31371264 | consumed tokens: 15125151744 | elapsed time per iteration (ms): 122419.3 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.730748E+00 | loss scale: 32768.0 | grad norm: 19442.202 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.71 | iteration 15319/ 292968 | consumed samples: 31373312 | consumed tokens: 15127003136 | elapsed time per iteration (ms): 121673.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.741505E+00 | loss scale: 32768.0 | grad norm: 20723.193 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.31 | iteration 15320/ 292968 | consumed samples: 31375360 | consumed tokens: 15128854528 | elapsed time per iteration (ms): 123007.4 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.725970E+00 | loss scale: 32768.0 | grad norm: 24610.947 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.24 | iteration 15321/ 292968 | consumed samples: 31377408 | consumed tokens: 15130705920 | elapsed time per iteration (ms): 121747.5 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.725211E+00 | loss scale: 32768.0 | grad norm: 29511.103 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.25 | iteration 15322/ 292968 | consumed samples: 31379456 | consumed tokens: 15132557312 | elapsed time per iteration (ms): 121279.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.709219E+00 | loss scale: 32768.0 | grad norm: 31032.417 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.63 | iteration 15323/ 292968 | consumed samples: 31381504 | consumed tokens: 15134408704 | elapsed time per iteration (ms): 122354.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.728120E+00 | loss scale: 32768.0 | grad norm: 30901.615 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.76 | iteration 15324/ 292968 | consumed samples: 31383552 | consumed tokens: 15136260096 | elapsed time per iteration (ms): 122357.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.720234E+00 | loss scale: 32768.0 | grad norm: 25594.841 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.76 | iteration 15325/ 292968 | consumed samples: 31385600 | consumed tokens: 15138111488 | elapsed time per iteration (ms): 122725.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.727602E+00 | loss scale: 32768.0 | grad norm: 28692.347 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.46 | iteration 15326/ 292968 | consumed samples: 31387648 | consumed tokens: 15139962880 | elapsed time per iteration (ms): 123974.5 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.722110E+00 | loss scale: 32768.0 | grad norm: 33580.573 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.48 | iteration 15327/ 292968 | consumed samples: 31389696 | consumed tokens: 15141814272 | elapsed time per iteration (ms): 122878.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.723717E+00 | loss scale: 32768.0 | grad norm: 28359.648 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.34 | iteration 15328/ 292968 | consumed samples: 31391744 | consumed tokens: 15143665664 | elapsed time per iteration (ms): 122463.3 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.730499E+00 | loss scale: 32768.0 | grad norm: 20285.801 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.67 | iteration 15329/ 292968 | consumed samples: 31393792 | consumed tokens: 15145517056 | elapsed time per iteration (ms): 122999.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.730635E+00 | loss scale: 32768.0 | grad norm: 24483.729 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.25 | iteration 15330/ 292968 | consumed samples: 31395840 | consumed tokens: 15147368448 | elapsed time per iteration (ms): 124063.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.741171E+00 | loss scale: 32768.0 | grad norm: 29353.899 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.41 | iteration 15331/ 292968 | consumed samples: 31397888 | consumed tokens: 15149219840 | elapsed time per iteration (ms): 121657.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.760620E+00 | loss scale: 32768.0 | grad norm: 21369.755 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.32 | iteration 15332/ 292968 | consumed samples: 31399936 | consumed tokens: 15151071232 | elapsed time per iteration (ms): 122933.5 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.759383E+00 | loss scale: 32768.0 | grad norm: 29062.319 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.30 | iteration 15333/ 292968 | consumed samples: 31401984 | consumed tokens: 15152922624 | elapsed time per iteration (ms): 121550.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.748923E+00 | loss scale: 32768.0 | grad norm: 36615.689 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.41 | iteration 15334/ 292968 | consumed samples: 31404032 | consumed tokens: 15154774016 | elapsed time per iteration (ms): 123085.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.759537E+00 | loss scale: 32768.0 | grad norm: 26783.116 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.18 | iteration 15335/ 292968 | consumed samples: 31406080 | consumed tokens: 15156625408 | elapsed time per iteration (ms): 122404.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.760616E+00 | loss scale: 32768.0 | grad norm: 35748.168 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.72 | iteration 15336/ 292968 | consumed samples: 31408128 | consumed tokens: 15158476800 | elapsed time per iteration (ms): 125235.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.744639E+00 | loss scale: 32768.0 | grad norm: 17518.310 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.51 | iteration 15337/ 292968 | consumed samples: 31410176 | consumed tokens: 15160328192 | elapsed time per iteration (ms): 124822.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.705603E+00 | loss scale: 32768.0 | grad norm: 24447.617 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.83 | iteration 15338/ 292968 | consumed samples: 31412224 | consumed tokens: 15162179584 | elapsed time per iteration (ms): 122875.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.777028E+00 | loss scale: 32768.0 | grad norm: 26088.171 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.34 | iteration 15339/ 292968 | consumed samples: 31414272 | consumed tokens: 15164030976 | elapsed time per iteration (ms): 125302.6 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.721714E+00 | loss scale: 32768.0 | grad norm: 17111.486 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.46 | iteration 15340/ 292968 | consumed samples: 31416320 | consumed tokens: 15165882368 | elapsed time per iteration (ms): 126503.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.760851E+00 | loss scale: 32768.0 | grad norm: 16528.837 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.55 | iteration 15341/ 292968 | consumed samples: 31418368 | consumed tokens: 15167733760 | elapsed time per iteration (ms): 128002.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.745749E+00 | loss scale: 32768.0 | grad norm: 16922.948 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.45 | iteration 15342/ 292968 | consumed samples: 31420416 | consumed tokens: 15169585152 | elapsed time per iteration (ms): 131414.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.730696E+00 | loss scale: 32768.0 | grad norm: 19031.192 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.02 | iteration 15343/ 292968 | consumed samples: 31422464 | consumed tokens: 15171436544 | elapsed time per iteration (ms): 131674.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.715384E+00 | loss scale: 32768.0 | grad norm: 21153.232 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 90.84 | iteration 15344/ 292968 | consumed samples: 31424512 | consumed tokens: 15173287936 | elapsed time per iteration (ms): 128436.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.725750E+00 | loss scale: 32768.0 | grad norm: 21516.645 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.13 | iteration 15345/ 292968 | consumed samples: 31426560 | consumed tokens: 15175139328 | elapsed time per iteration (ms): 124832.4 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.733419E+00 | loss scale: 32768.0 | grad norm: 26782.065 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.82 | iteration 15346/ 292968 | consumed samples: 31428608 | consumed tokens: 15176990720 | elapsed time per iteration (ms): 123863.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.727474E+00 | loss scale: 32768.0 | grad norm: 26950.549 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.57 | iteration 15347/ 292968 | consumed samples: 31430656 | consumed tokens: 15178842112 | elapsed time per iteration (ms): 123302.4 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.710456E+00 | loss scale: 32768.0 | grad norm: 24674.631 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.01 | iteration 15348/ 292968 | consumed samples: 31432704 | consumed tokens: 15180693504 | elapsed time per iteration (ms): 123104.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.729518E+00 | loss scale: 32768.0 | grad norm: 18341.999 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.16 | iteration 15349/ 292968 | consumed samples: 31434752 | consumed tokens: 15182544896 | elapsed time per iteration (ms): 122770.4 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.719071E+00 | loss scale: 32768.0 | grad norm: 17562.120 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.43 | iteration 15350/ 292968 | consumed samples: 31436800 | consumed tokens: 15184396288 | elapsed time per iteration (ms): 125171.5 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.689272E+00 | loss scale: 32768.0 | grad norm: 25267.289 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.56 | saving checkpoint at iteration 15350 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 07:40:39,881] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/mp_rank_00_model_states.pt [2022-01-27 07:40:39,939] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/mp_rank_01_model_states.pt [2022-01-27 07:40:51,491] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 07:40:51,762] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 07:40:53,139] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 07:40:54,693] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 07:40:54,792] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 07:40:54,795] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 07:40:55,094] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 07:40:55,109] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 07:41:00,216] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 07:41:01,982] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 07:41:02,180] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 07:41:03,510] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 07:41:03,595] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 07:41:04,341] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 07:41:04,663] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 07:41:04,857] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 07:41:04,926] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 07:41:05,169] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 07:41:05,855] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 07:41:05,839] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 07:41:06,021] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 07:41:06,514] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 07:41:06,787] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 07:41:06,860] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 07:41:06,812] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 07:41:07,028] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 07:41:07,050] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 07:41:07,246] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 07:41:07,457] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 07:41:07,534] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 07:41:07,537] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 07:41:07,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 07:41:07,799] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 07:41:07,781] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 07:41:07,880] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 07:41:07,838] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 07:41:07,839] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 07:41:08,038] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 07:41:08,085] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 07:41:08,137] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 07:41:08,144] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 07:41:08,172] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 07:41:08,297] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 07:41:08,511] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 07:41:08,616] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 07:41:08,851] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 07:41:08,874] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 07:41:09,384] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 07:41:09,429] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 07:41:09,447] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 07:41:09,584] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 07:41:09,630] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 07:41:09,680] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 07:41:09,701] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 07:41:09,715] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 07:41:09,754] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 07:41:09,833] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 07:41:09,895] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 07:41:09,863] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 07:41:10,181] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 07:41:10,341] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 07:41:10,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 07:41:10,452] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 07:41:10,353] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 07:41:10,522] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 07:41:10,657] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 07:41:10,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 07:41:10,756] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 07:41:10,889] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 07:41:10,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 07:41:10,982] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 07:41:10,913] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 07:41:11,110] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 07:41:11,177] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 07:41:11,403] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 07:41:11,428] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 07:41:11,434] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 07:41:11,814] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 07:41:11,960] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 07:41:12,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 07:41:12,129] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 07:41:12,449] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 07:41:12,475] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 07:41:12,488] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 07:41:12,814] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 07:41:12,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 07:41:12,829] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 07:41:12,937] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 07:41:13,834] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 07:41:13,955] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 07:41:14,153] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 07:41:14,180] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 07:41:14,275] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 07:41:14,296] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 07:41:14,319] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 07:41:14,466] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 07:41:14,472] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 07:41:14,554] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 07:41:14,609] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 07:41:15,129] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 07:41:15,503] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 07:41:15,592] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 07:41:15,691] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 07:41:15,986] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 07:41:15,989] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 07:41:16,024] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 07:41:16,076] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 07:41:16,214] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 07:41:16,235] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 07:41:16,462] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 07:41:16,649] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 07:41:16,774] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 07:41:17,162] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 07:41:17,188] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 07:41:17,420] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 07:41:17,492] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 07:41:17,662] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 07:41:17,931] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 07:41:18,089] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 07:41:18,117] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 07:41:18,530] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 07:41:18,540] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 07:41:18,640] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 07:41:18,771] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 07:41:19,416] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 07:41:19,589] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 07:41:19,995] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 07:41:20,101] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15350/zero_pp_rank_0_mp_rank_74_optim_states.pt successfully saved checkpoint at iteration 15350 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 44793.69 iteration 15351/ 292968 | consumed samples: 31438848 | consumed tokens: 15186247680 | elapsed time per iteration (ms): 168889.3 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.731218E+00 | loss scale: 32768.0 | grad norm: 29028.467 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 70.82 | iteration 15352/ 292968 | consumed samples: 31440896 | consumed tokens: 15188099072 | elapsed time per iteration (ms): 122489.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.693542E+00 | loss scale: 32768.0 | grad norm: 28420.586 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.65 | iteration 15353/ 292968 | consumed samples: 31442944 | consumed tokens: 15189950464 | elapsed time per iteration (ms): 123392.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.682698E+00 | loss scale: 32768.0 | grad norm: 17334.692 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.94 | iteration 15354/ 292968 | consumed samples: 31444992 | consumed tokens: 15191801856 | elapsed time per iteration (ms): 121421.7 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.715202E+00 | loss scale: 32768.0 | grad norm: 15913.540 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.51 | iteration 15355/ 292968 | consumed samples: 31447040 | consumed tokens: 15193653248 | elapsed time per iteration (ms): 121515.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.728105E+00 | loss scale: 32768.0 | grad norm: 17467.181 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.43 | iteration 15356/ 292968 | consumed samples: 31449088 | consumed tokens: 15195504640 | elapsed time per iteration (ms): 121711.4 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.690442E+00 | loss scale: 32768.0 | grad norm: 19761.797 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.28 | iteration 15357/ 292968 | consumed samples: 31451136 | consumed tokens: 15197356032 | elapsed time per iteration (ms): 120307.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.734651E+00 | loss scale: 32768.0 | grad norm: 23402.225 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.42 | iteration 15358/ 292968 | consumed samples: 31453184 | consumed tokens: 15199207424 | elapsed time per iteration (ms): 122167.3 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.727960E+00 | loss scale: 32768.0 | grad norm: 22609.717 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.91 | iteration 15359/ 292968 | consumed samples: 31455232 | consumed tokens: 15201058816 | elapsed time per iteration (ms): 119965.9 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.698885E+00 | loss scale: 32768.0 | grad norm: 19875.327 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.71 | iteration 15360/ 292968 | consumed samples: 31457280 | consumed tokens: 15202910208 | elapsed time per iteration (ms): 120822.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.737026E+00 | loss scale: 32768.0 | grad norm: 23254.665 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.00 | iteration 15361/ 292968 | consumed samples: 31459328 | consumed tokens: 15204761600 | elapsed time per iteration (ms): 123403.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.735713E+00 | loss scale: 32768.0 | grad norm: 26217.108 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.93 | iteration 15362/ 292968 | consumed samples: 31461376 | consumed tokens: 15206612992 | elapsed time per iteration (ms): 124572.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.719513E+00 | loss scale: 32768.0 | grad norm: 23054.725 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.02 | iteration 15363/ 292968 | consumed samples: 31463424 | consumed tokens: 15208464384 | elapsed time per iteration (ms): 126215.0 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.713786E+00 | loss scale: 32768.0 | grad norm: 23537.612 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.77 | iteration 15364/ 292968 | consumed samples: 31465472 | consumed tokens: 15210315776 | elapsed time per iteration (ms): 125119.8 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.709743E+00 | loss scale: 32768.0 | grad norm: 24368.506 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.60 | iteration 15365/ 292968 | consumed samples: 31467520 | consumed tokens: 15212167168 | elapsed time per iteration (ms): 123725.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.730852E+00 | loss scale: 32768.0 | grad norm: 23288.033 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.68 | iteration 15366/ 292968 | consumed samples: 31469568 | consumed tokens: 15214018560 | elapsed time per iteration (ms): 123281.1 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.712655E+00 | loss scale: 32768.0 | grad norm: 26885.187 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.02 | iteration 15367/ 292968 | consumed samples: 31471616 | consumed tokens: 15215869952 | elapsed time per iteration (ms): 123763.2 | learning rate: 5.955E-05 | global batch size: 2048 | lm loss: 2.681406E+00 | loss scale: 32768.0 | grad norm: 28121.165 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.65 | iteration 15368/ 292968 | consumed samples: 31473664 | consumed tokens: 15217721344 | elapsed time per iteration (ms): 123282.4 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.693801E+00 | loss scale: 32768.0 | grad norm: 26406.610 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.02 | iteration 15369/ 292968 | consumed samples: 31475712 | consumed tokens: 15219572736 | elapsed time per iteration (ms): 122425.5 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.729719E+00 | loss scale: 32768.0 | grad norm: 23384.828 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.70 | iteration 15370/ 292968 | consumed samples: 31477760 | consumed tokens: 15221424128 | elapsed time per iteration (ms): 121951.9 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.702058E+00 | loss scale: 32768.0 | grad norm: 21736.393 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.08 | iteration 15371/ 292968 | consumed samples: 31479808 | consumed tokens: 15223275520 | elapsed time per iteration (ms): 122291.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.697882E+00 | loss scale: 32768.0 | grad norm: 24192.310 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.81 | iteration 15372/ 292968 | consumed samples: 31481856 | consumed tokens: 15225126912 | elapsed time per iteration (ms): 121694.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.720616E+00 | loss scale: 32768.0 | grad norm: 29804.891 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.29 | iteration 15373/ 292968 | consumed samples: 31483904 | consumed tokens: 15226978304 | elapsed time per iteration (ms): 122155.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.711972E+00 | loss scale: 32768.0 | grad norm: 33514.971 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.92 | iteration 15374/ 292968 | consumed samples: 31485952 | consumed tokens: 15228829696 | elapsed time per iteration (ms): 122774.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.693480E+00 | loss scale: 32768.0 | grad norm: 29449.809 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.42 | iteration 15375/ 292968 | consumed samples: 31488000 | consumed tokens: 15230681088 | elapsed time per iteration (ms): 123199.5 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.719565E+00 | loss scale: 32768.0 | grad norm: 37756.408 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.09 | iteration 15376/ 292968 | consumed samples: 31490048 | consumed tokens: 15232532480 | elapsed time per iteration (ms): 121169.5 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.718655E+00 | loss scale: 32768.0 | grad norm: 27851.856 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.72 | iteration 15377/ 292968 | consumed samples: 31492096 | consumed tokens: 15234383872 | elapsed time per iteration (ms): 121167.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.689389E+00 | loss scale: 32768.0 | grad norm: 28641.965 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.72 | iteration 15378/ 292968 | consumed samples: 31494144 | consumed tokens: 15236235264 | elapsed time per iteration (ms): 122292.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.695798E+00 | loss scale: 32768.0 | grad norm: 33908.284 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.81 | iteration 15379/ 292968 | consumed samples: 31496192 | consumed tokens: 15238086656 | elapsed time per iteration (ms): 122964.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.698298E+00 | loss scale: 32768.0 | grad norm: 24047.995 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.27 | iteration 15380/ 292968 | consumed samples: 31498240 | consumed tokens: 15239938048 | elapsed time per iteration (ms): 124418.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.699409E+00 | loss scale: 32768.0 | grad norm: 18597.114 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.14 | iteration 15381/ 292968 | consumed samples: 31500288 | consumed tokens: 15241789440 | elapsed time per iteration (ms): 123363.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.717857E+00 | loss scale: 32768.0 | grad norm: 25602.868 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.96 | iteration 15382/ 292968 | consumed samples: 31502336 | consumed tokens: 15243640832 | elapsed time per iteration (ms): 124219.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.709962E+00 | loss scale: 32768.0 | grad norm: 23625.692 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.29 | iteration 15383/ 292968 | consumed samples: 31504384 | consumed tokens: 15245492224 | elapsed time per iteration (ms): 124375.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.725829E+00 | loss scale: 32768.0 | grad norm: 16907.323 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.17 | iteration 15384/ 292968 | consumed samples: 31506432 | consumed tokens: 15247343616 | elapsed time per iteration (ms): 122298.9 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.737748E+00 | loss scale: 32768.0 | grad norm: 25573.168 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.80 | iteration 15385/ 292968 | consumed samples: 31508480 | consumed tokens: 15249195008 | elapsed time per iteration (ms): 123555.9 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.720362E+00 | loss scale: 32768.0 | grad norm: 26025.310 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 96.81 | iteration 15386/ 292968 | consumed samples: 31510528 | consumed tokens: 15251046400 | elapsed time per iteration (ms): 122516.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.734437E+00 | loss scale: 32768.0 | grad norm: 19845.458 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.63 | iteration 15387/ 292968 | consumed samples: 31512576 | consumed tokens: 15252897792 | elapsed time per iteration (ms): 120698.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.703614E+00 | loss scale: 32768.0 | grad norm: 16840.720 | num zeros: 0.0 | curriculum seqlen: 904 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.10 | iteration 15388/ 292968 | consumed samples: 31514624 | consumed tokens: 15254765568 | elapsed time per iteration (ms): 121711.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.725096E+00 | loss scale: 32768.0 | grad norm: 19437.543 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.15 | iteration 15389/ 292968 | consumed samples: 31516672 | consumed tokens: 15256633344 | elapsed time per iteration (ms): 122278.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.695956E+00 | loss scale: 32768.0 | grad norm: 19606.142 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.69 | iteration 15390/ 292968 | consumed samples: 31518720 | consumed tokens: 15258501120 | elapsed time per iteration (ms): 123725.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.720148E+00 | loss scale: 32768.0 | grad norm: 17213.681 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.53 | iteration 15391/ 292968 | consumed samples: 31520768 | consumed tokens: 15260368896 | elapsed time per iteration (ms): 121006.4 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.714269E+00 | loss scale: 32768.0 | grad norm: 18523.189 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.72 | iteration 15392/ 292968 | consumed samples: 31522816 | consumed tokens: 15262236672 | elapsed time per iteration (ms): 119511.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.710197E+00 | loss scale: 32768.0 | grad norm: 25127.085 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.97 | iteration 15393/ 292968 | consumed samples: 31524864 | consumed tokens: 15264104448 | elapsed time per iteration (ms): 120730.4 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.707612E+00 | loss scale: 32768.0 | grad norm: 34035.109 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.95 | iteration 15394/ 292968 | consumed samples: 31526912 | consumed tokens: 15265972224 | elapsed time per iteration (ms): 120742.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.709263E+00 | loss scale: 32768.0 | grad norm: 35613.749 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.94 | iteration 15395/ 292968 | consumed samples: 31528960 | consumed tokens: 15267840000 | elapsed time per iteration (ms): 121932.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.692772E+00 | loss scale: 32768.0 | grad norm: 27587.123 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.97 | iteration 15396/ 292968 | consumed samples: 31531008 | consumed tokens: 15269707776 | elapsed time per iteration (ms): 122978.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.706587E+00 | loss scale: 32768.0 | grad norm: 28298.277 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.12 | iteration 15397/ 292968 | consumed samples: 31533056 | consumed tokens: 15271575552 | elapsed time per iteration (ms): 125332.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.723699E+00 | loss scale: 32768.0 | grad norm: 32413.395 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.28 | iteration 15398/ 292968 | consumed samples: 31535104 | consumed tokens: 15273443328 | elapsed time per iteration (ms): 123321.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.712890E+00 | loss scale: 32768.0 | grad norm: 28862.817 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.85 | iteration 15399/ 292968 | consumed samples: 31537152 | consumed tokens: 15275311104 | elapsed time per iteration (ms): 121853.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.700941E+00 | loss scale: 32768.0 | grad norm: 23645.736 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.03 | iteration 15400/ 292968 | consumed samples: 31539200 | consumed tokens: 15277178880 | elapsed time per iteration (ms): 122324.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.700488E+00 | loss scale: 32768.0 | grad norm: 28085.078 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.65 | saving checkpoint at iteration 15400 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 09:23:33,119] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/mp_rank_00_model_states.pt [2022-01-27 09:23:33,197] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/mp_rank_01_model_states.pt [2022-01-27 09:23:43,982] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 09:23:45,680] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 09:23:46,470] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 09:23:46,730] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 09:23:47,869] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 09:23:47,875] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 09:23:47,890] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 09:23:47,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 09:23:53,981] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 09:23:54,383] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 09:23:54,442] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 09:23:55,605] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 09:23:56,369] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 09:23:57,876] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 09:23:57,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 09:23:57,986] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 09:23:57,891] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 09:23:57,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 09:23:57,945] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 09:23:58,000] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 09:23:58,336] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 09:23:58,347] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 09:23:58,347] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 09:23:58,497] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 09:23:58,766] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 09:23:59,241] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 09:23:59,306] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 09:23:59,515] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 09:23:59,603] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 09:23:59,715] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 09:23:59,878] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 09:24:00,687] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 09:24:00,890] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 09:24:00,959] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 09:24:01,069] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 09:24:01,099] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 09:24:01,123] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 09:24:01,129] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 09:24:01,158] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 09:24:01,223] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 09:24:01,187] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 09:24:01,358] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 09:24:01,357] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 09:24:01,382] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 09:24:01,409] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 09:24:01,408] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 09:24:01,532] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 09:24:01,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 09:24:01,748] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 09:24:01,808] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 09:24:01,850] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 09:24:01,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 09:24:01,786] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 09:24:02,162] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 09:24:02,314] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 09:24:02,403] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 09:24:02,365] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 09:24:02,500] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 09:24:02,548] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 09:24:02,592] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 09:24:02,651] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 09:24:02,656] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 09:24:02,671] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 09:24:02,863] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 09:24:03,105] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 09:24:03,164] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 09:24:03,252] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 09:24:03,369] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 09:24:03,401] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 09:24:03,469] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 09:24:03,484] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 09:24:03,493] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 09:24:03,568] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 09:24:03,631] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 09:24:03,655] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 09:24:03,717] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 09:24:03,764] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 09:24:03,826] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 09:24:03,834] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 09:24:04,088] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 09:24:04,088] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 09:24:04,206] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 09:24:04,215] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 09:24:04,373] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 09:24:04,465] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 09:24:04,485] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 09:24:04,529] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 09:24:04,587] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 09:24:05,271] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 09:24:05,357] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 09:24:05,406] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 09:24:05,422] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 09:24:05,961] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 09:24:06,032] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 09:24:06,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 09:24:06,629] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 09:24:06,659] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 09:24:08,057] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 09:24:08,260] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 09:24:08,303] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 09:24:08,343] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 09:24:08,416] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 09:24:09,618] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 09:24:10,291] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 09:24:10,356] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 09:24:10,372] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 09:24:10,379] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 09:24:10,514] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 09:24:11,201] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 09:24:11,272] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 09:24:11,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 09:24:11,633] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 09:24:11,567] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 09:24:11,700] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 09:24:11,987] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 09:24:11,868] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 09:24:12,072] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 09:24:12,798] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 09:24:13,181] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 09:24:13,298] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 09:24:13,995] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 09:24:14,097] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 09:24:14,631] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 09:24:14,687] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 09:24:15,663] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 09:24:15,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 09:24:17,934] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 09:24:18,092] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15400/zero_pp_rank_0_mp_rank_67_optim_states.pt successfully saved checkpoint at iteration 15400 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 49759.30 iteration 15401/ 292968 | consumed samples: 31541248 | consumed tokens: 15279046656 | elapsed time per iteration (ms): 171699.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.707073E+00 | loss scale: 32768.0 | grad norm: 40091.081 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 70.28 | iteration 15402/ 292968 | consumed samples: 31543296 | consumed tokens: 15280914432 | elapsed time per iteration (ms): 122240.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.720362E+00 | loss scale: 32768.0 | grad norm: 23661.007 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.72 | iteration 15403/ 292968 | consumed samples: 31545344 | consumed tokens: 15282782208 | elapsed time per iteration (ms): 120393.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.718127E+00 | loss scale: 32768.0 | grad norm: 27276.940 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.23 | iteration 15404/ 292968 | consumed samples: 31547392 | consumed tokens: 15284649984 | elapsed time per iteration (ms): 118818.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.707433E+00 | loss scale: 32768.0 | grad norm: 27165.043 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.56 | iteration 15405/ 292968 | consumed samples: 31549440 | consumed tokens: 15286517760 | elapsed time per iteration (ms): 118765.5 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.697955E+00 | loss scale: 32768.0 | grad norm: 28610.673 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.60 | iteration 15406/ 292968 | consumed samples: 31551488 | consumed tokens: 15288385536 | elapsed time per iteration (ms): 118516.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.723006E+00 | loss scale: 32768.0 | grad norm: 33728.621 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.82 | iteration 15407/ 292968 | consumed samples: 31553536 | consumed tokens: 15290253312 | elapsed time per iteration (ms): 118280.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.717021E+00 | loss scale: 32768.0 | grad norm: 21189.777 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.02 | iteration 15408/ 292968 | consumed samples: 31555584 | consumed tokens: 15292121088 | elapsed time per iteration (ms): 117639.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.712460E+00 | loss scale: 32768.0 | grad norm: 19274.620 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.58 | iteration 15409/ 292968 | consumed samples: 31557632 | consumed tokens: 15293988864 | elapsed time per iteration (ms): 117902.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.710765E+00 | loss scale: 32768.0 | grad norm: 27957.414 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.35 | iteration 15410/ 292968 | consumed samples: 31559680 | consumed tokens: 15295856640 | elapsed time per iteration (ms): 117723.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.700470E+00 | loss scale: 32768.0 | grad norm: 19711.071 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.50 | iteration 15411/ 292968 | consumed samples: 31561728 | consumed tokens: 15297724416 | elapsed time per iteration (ms): 118318.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.682748E+00 | loss scale: 32768.0 | grad norm: 21233.693 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.99 | iteration 15412/ 292968 | consumed samples: 31563776 | consumed tokens: 15299592192 | elapsed time per iteration (ms): 119615.5 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.693375E+00 | loss scale: 32768.0 | grad norm: 20709.601 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.88 | iteration 15413/ 292968 | consumed samples: 31565824 | consumed tokens: 15301459968 | elapsed time per iteration (ms): 119856.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.696823E+00 | loss scale: 32768.0 | grad norm: 14591.791 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.68 | iteration 15414/ 292968 | consumed samples: 31567872 | consumed tokens: 15303327744 | elapsed time per iteration (ms): 121135.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.714775E+00 | loss scale: 32768.0 | grad norm: 26749.503 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.62 | iteration 15415/ 292968 | consumed samples: 31569920 | consumed tokens: 15305195520 | elapsed time per iteration (ms): 122344.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.715199E+00 | loss scale: 32768.0 | grad norm: 27578.985 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.63 | iteration 15416/ 292968 | consumed samples: 31571968 | consumed tokens: 15307063296 | elapsed time per iteration (ms): 121146.5 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.702549E+00 | loss scale: 32768.0 | grad norm: 20630.009 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.61 | iteration 15417/ 292968 | consumed samples: 31574016 | consumed tokens: 15308931072 | elapsed time per iteration (ms): 121371.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.704869E+00 | loss scale: 32768.0 | grad norm: 21059.435 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.42 | iteration 15418/ 292968 | consumed samples: 31576064 | consumed tokens: 15310798848 | elapsed time per iteration (ms): 122454.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.708481E+00 | loss scale: 32768.0 | grad norm: 24547.575 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.54 | iteration 15419/ 292968 | consumed samples: 31578112 | consumed tokens: 15312666624 | elapsed time per iteration (ms): 123144.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.724451E+00 | loss scale: 32768.0 | grad norm: 34699.413 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.99 | iteration 15420/ 292968 | consumed samples: 31580160 | consumed tokens: 15314534400 | elapsed time per iteration (ms): 123204.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.718270E+00 | loss scale: 32768.0 | grad norm: 32778.516 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.94 | iteration 15421/ 292968 | consumed samples: 31582208 | consumed tokens: 15316402176 | elapsed time per iteration (ms): 123090.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.715952E+00 | loss scale: 32768.0 | grad norm: 36436.489 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.03 | iteration 15422/ 292968 | consumed samples: 31584256 | consumed tokens: 15318269952 | elapsed time per iteration (ms): 122213.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.720065E+00 | loss scale: 32768.0 | grad norm: 29391.816 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.74 | iteration 15423/ 292968 | consumed samples: 31586304 | consumed tokens: 15320137728 | elapsed time per iteration (ms): 122187.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.735278E+00 | loss scale: 32768.0 | grad norm: 31980.557 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.76 | iteration 15424/ 292968 | consumed samples: 31588352 | consumed tokens: 15322005504 | elapsed time per iteration (ms): 123581.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.736652E+00 | loss scale: 32768.0 | grad norm: 39995.803 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.64 | iteration 15425/ 292968 | consumed samples: 31590400 | consumed tokens: 15323873280 | elapsed time per iteration (ms): 124467.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.724551E+00 | loss scale: 32768.0 | grad norm: 16894.665 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.95 | iteration 15426/ 292968 | consumed samples: 31592448 | consumed tokens: 15325741056 | elapsed time per iteration (ms): 124211.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.707246E+00 | loss scale: 32768.0 | grad norm: 35126.995 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.15 | iteration 15427/ 292968 | consumed samples: 31594496 | consumed tokens: 15327608832 | elapsed time per iteration (ms): 123489.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.705539E+00 | loss scale: 32768.0 | grad norm: 32894.117 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.72 | iteration 15428/ 292968 | consumed samples: 31596544 | consumed tokens: 15329476608 | elapsed time per iteration (ms): 123156.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.701248E+00 | loss scale: 32768.0 | grad norm: 18444.528 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.98 | iteration 15429/ 292968 | consumed samples: 31598592 | consumed tokens: 15331344384 | elapsed time per iteration (ms): 122183.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.708805E+00 | loss scale: 32768.0 | grad norm: 23242.073 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.76 | iteration 15430/ 292968 | consumed samples: 31600640 | consumed tokens: 15333212160 | elapsed time per iteration (ms): 122387.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.700164E+00 | loss scale: 32768.0 | grad norm: 24324.382 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.60 | iteration 15431/ 292968 | consumed samples: 31602688 | consumed tokens: 15335079936 | elapsed time per iteration (ms): 121305.4 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.708877E+00 | loss scale: 32768.0 | grad norm: 17267.653 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.48 | iteration 15432/ 292968 | consumed samples: 31604736 | consumed tokens: 15336947712 | elapsed time per iteration (ms): 121870.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.705301E+00 | loss scale: 32768.0 | grad norm: 17354.471 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.02 | iteration 15433/ 292968 | consumed samples: 31606784 | consumed tokens: 15338815488 | elapsed time per iteration (ms): 121744.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.676364E+00 | loss scale: 32768.0 | grad norm: 20873.235 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.12 | iteration 15434/ 292968 | consumed samples: 31608832 | consumed tokens: 15340683264 | elapsed time per iteration (ms): 122662.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.719857E+00 | loss scale: 32768.0 | grad norm: 20092.744 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.38 | iteration 15435/ 292968 | consumed samples: 31610880 | consumed tokens: 15342551040 | elapsed time per iteration (ms): 122472.9 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.696934E+00 | loss scale: 32768.0 | grad norm: 18379.465 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.53 | iteration 15436/ 292968 | consumed samples: 31612928 | consumed tokens: 15344418816 | elapsed time per iteration (ms): 123271.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.706607E+00 | loss scale: 32768.0 | grad norm: 24839.765 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.89 | iteration 15437/ 292968 | consumed samples: 31614976 | consumed tokens: 15346286592 | elapsed time per iteration (ms): 122827.4 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.706925E+00 | loss scale: 32768.0 | grad norm: 29917.337 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.24 | iteration 15438/ 292968 | consumed samples: 31617024 | consumed tokens: 15348154368 | elapsed time per iteration (ms): 123562.4 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.701807E+00 | loss scale: 32768.0 | grad norm: 36510.308 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.66 | iteration 15439/ 292968 | consumed samples: 31619072 | consumed tokens: 15350022144 | elapsed time per iteration (ms): 122670.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.706107E+00 | loss scale: 32768.0 | grad norm: 31999.603 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.37 | iteration 15440/ 292968 | consumed samples: 31621120 | consumed tokens: 15351889920 | elapsed time per iteration (ms): 124480.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.723281E+00 | loss scale: 32768.0 | grad norm: 41633.500 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.94 | iteration 15441/ 292968 | consumed samples: 31623168 | consumed tokens: 15353757696 | elapsed time per iteration (ms): 123170.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.700380E+00 | loss scale: 32768.0 | grad norm: 36321.995 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.97 | iteration 15442/ 292968 | consumed samples: 31625216 | consumed tokens: 15355625472 | elapsed time per iteration (ms): 125053.4 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.704407E+00 | loss scale: 32768.0 | grad norm: 20587.421 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.50 | iteration 15443/ 292968 | consumed samples: 31627264 | consumed tokens: 15357493248 | elapsed time per iteration (ms): 125197.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.716075E+00 | loss scale: 32768.0 | grad norm: 25131.026 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.38 | iteration 15444/ 292968 | consumed samples: 31629312 | consumed tokens: 15359361024 | elapsed time per iteration (ms): 129065.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.707724E+00 | loss scale: 32768.0 | grad norm: 29743.717 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.50 | iteration 15445/ 292968 | consumed samples: 31631360 | consumed tokens: 15361228800 | elapsed time per iteration (ms): 130689.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.739555E+00 | loss scale: 32768.0 | grad norm: 22290.785 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.33 | iteration 15446/ 292968 | consumed samples: 31633408 | consumed tokens: 15363096576 | elapsed time per iteration (ms): 132682.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.735341E+00 | loss scale: 32768.0 | grad norm: 20764.813 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.95 | iteration 15447/ 292968 | consumed samples: 31635456 | consumed tokens: 15364964352 | elapsed time per iteration (ms): 133157.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.713868E+00 | loss scale: 32768.0 | grad norm: 25200.578 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.62 | iteration 15448/ 292968 | consumed samples: 31637504 | consumed tokens: 15366832128 | elapsed time per iteration (ms): 133696.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.683150E+00 | loss scale: 32768.0 | grad norm: 25564.059 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.26 | iteration 15449/ 292968 | consumed samples: 31639552 | consumed tokens: 15368699904 | elapsed time per iteration (ms): 135762.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.690618E+00 | loss scale: 32768.0 | grad norm: 26360.263 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.88 | iteration 15450/ 292968 | consumed samples: 31641600 | consumed tokens: 15370567680 | elapsed time per iteration (ms): 135694.6 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.699383E+00 | loss scale: 32768.0 | grad norm: 28985.218 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.93 | ------------------------------------------------------------------------------------------- valid loss at iteration 15450 | lm loss value: 3.238402E+00 | lm loss PPL: 2.549295E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 15450 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 11:13:38,402] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/mp_rank_01_model_states.pt [2022-01-27 11:13:38,424] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/mp_rank_00_model_states.pt [2022-01-27 11:13:50,693] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 11:13:50,767] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 11:13:50,779] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 11:13:51,122] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 11:13:52,660] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 11:13:52,993] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 11:13:53,320] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 11:13:53,348] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 11:13:59,650] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 11:14:00,311] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 11:14:00,530] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 11:14:00,554] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 11:14:00,919] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 11:14:01,290] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 11:14:01,989] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 11:14:02,216] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 11:14:02,339] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 11:14:02,834] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 11:14:02,886] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 11:14:02,915] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 11:14:02,997] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 11:14:03,041] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 11:14:03,469] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 11:14:03,674] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 11:14:04,198] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 11:14:04,226] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 11:14:04,629] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 11:14:04,846] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 11:14:04,946] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 11:14:05,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 11:14:05,192] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 11:14:05,255] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 11:14:05,204] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 11:14:05,303] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 11:14:05,247] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 11:14:05,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 11:14:05,477] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 11:14:05,527] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 11:14:05,603] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 11:14:05,672] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 11:14:05,780] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 11:14:06,275] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 11:14:06,340] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 11:14:06,385] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 11:14:06,447] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 11:14:06,498] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 11:14:06,550] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 11:14:06,641] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 11:14:06,684] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 11:14:07,172] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 11:14:07,544] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 11:14:07,754] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 11:14:07,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 11:14:07,866] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 11:14:07,989] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 11:14:08,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 11:14:08,104] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 11:14:08,165] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 11:14:08,309] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 11:14:08,330] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 11:14:08,686] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 11:14:08,732] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 11:14:08,741] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 11:14:09,157] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 11:14:09,195] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 11:14:09,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 11:14:09,414] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 11:14:09,931] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 11:14:10,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 11:14:10,014] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 11:14:10,028] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 11:14:10,068] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 11:14:10,389] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 11:14:10,418] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 11:14:10,528] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 11:14:10,687] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 11:14:10,902] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 11:14:11,031] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 11:14:11,265] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 11:14:11,502] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 11:14:11,664] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 11:14:11,750] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 11:14:11,765] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 11:14:12,196] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 11:14:12,223] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 11:14:12,491] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 11:14:12,506] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 11:14:12,522] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 11:14:12,480] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 11:14:12,724] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 11:14:12,765] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 11:14:12,787] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 11:14:12,788] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 11:14:12,865] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 11:14:12,920] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 11:14:13,071] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 11:14:13,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 11:14:13,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 11:14:13,917] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 11:14:13,946] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 11:14:14,249] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 11:14:14,359] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 11:14:14,310] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 11:14:14,561] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 11:14:14,646] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 11:14:14,697] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 11:14:14,802] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 11:14:14,943] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 11:14:14,974] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 11:14:15,058] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 11:14:15,129] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 11:14:15,178] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 11:14:15,740] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 11:14:16,228] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 11:14:16,434] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 11:14:16,615] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 11:14:17,165] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 11:14:17,351] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 11:14:17,365] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 11:14:17,780] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 11:14:17,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 11:14:17,999] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 11:14:18,027] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 11:14:18,379] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 11:14:18,806] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 11:14:19,024] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 11:14:20,379] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 11:14:20,530] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15450/zero_pp_rank_0_mp_rank_126_optim_states.pt successfully saved checkpoint at iteration 15450 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 47438.59 iteration 15451/ 292968 | consumed samples: 31643648 | consumed tokens: 15372435456 | elapsed time per iteration (ms): 568242.0 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.724660E+00 | loss scale: 32768.0 | grad norm: 31831.103 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.004 | TFLOPs: 21.24 | iteration 15452/ 292968 | consumed samples: 31645696 | consumed tokens: 15374303232 | elapsed time per iteration (ms): 133640.1 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.693929E+00 | loss scale: 32768.0 | grad norm: 32578.806 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.30 | iteration 15453/ 292968 | consumed samples: 31647744 | consumed tokens: 15376171008 | elapsed time per iteration (ms): 131760.2 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.703214E+00 | loss scale: 32768.0 | grad norm: 27941.236 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.58 | iteration 15454/ 292968 | consumed samples: 31649792 | consumed tokens: 15378038784 | elapsed time per iteration (ms): 135202.3 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.686443E+00 | loss scale: 32768.0 | grad norm: 27502.055 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.25 | iteration 15455/ 292968 | consumed samples: 31651840 | consumed tokens: 15379906560 | elapsed time per iteration (ms): 133696.7 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.722364E+00 | loss scale: 32768.0 | grad norm: 29798.196 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.26 | iteration 15456/ 292968 | consumed samples: 31653888 | consumed tokens: 15381774336 | elapsed time per iteration (ms): 132953.8 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.704609E+00 | loss scale: 32768.0 | grad norm: 30028.312 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.76 | iteration 15457/ 292968 | consumed samples: 31655936 | consumed tokens: 15383642112 | elapsed time per iteration (ms): 131370.5 | learning rate: 5.954E-05 | global batch size: 2048 | lm loss: 2.715984E+00 | loss scale: 32768.0 | grad norm: 28535.931 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 91.86 | iteration 15458/ 292968 | consumed samples: 31657984 | consumed tokens: 15385509888 | elapsed time per iteration (ms): 133643.8 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.724964E+00 | loss scale: 32768.0 | grad norm: 25362.923 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.29 | iteration 15459/ 292968 | consumed samples: 31660032 | consumed tokens: 15387377664 | elapsed time per iteration (ms): 133397.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.719293E+00 | loss scale: 32768.0 | grad norm: 30356.194 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.46 | iteration 15460/ 292968 | consumed samples: 31662080 | consumed tokens: 15389245440 | elapsed time per iteration (ms): 136056.8 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.718869E+00 | loss scale: 32768.0 | grad norm: 30572.219 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.69 | iteration 15461/ 292968 | consumed samples: 31664128 | consumed tokens: 15391113216 | elapsed time per iteration (ms): 133849.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.708124E+00 | loss scale: 32768.0 | grad norm: 20873.681 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.15 | iteration 15462/ 292968 | consumed samples: 31666176 | consumed tokens: 15392980992 | elapsed time per iteration (ms): 136773.8 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.697734E+00 | loss scale: 32768.0 | grad norm: 20055.959 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 88.23 | iteration 15463/ 292968 | consumed samples: 31668224 | consumed tokens: 15394848768 | elapsed time per iteration (ms): 135037.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.693333E+00 | loss scale: 32768.0 | grad norm: 28868.506 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.36 | iteration 15464/ 292968 | consumed samples: 31670272 | consumed tokens: 15396716544 | elapsed time per iteration (ms): 132504.0 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.682912E+00 | loss scale: 32768.0 | grad norm: 26676.183 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.07 | iteration 15465/ 292968 | consumed samples: 31672320 | consumed tokens: 15398584320 | elapsed time per iteration (ms): 133938.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.675688E+00 | loss scale: 32768.0 | grad norm: 21005.372 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.09 | iteration 15466/ 292968 | consumed samples: 31674368 | consumed tokens: 15400452096 | elapsed time per iteration (ms): 134951.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.703567E+00 | loss scale: 32768.0 | grad norm: 23254.898 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.42 | iteration 15467/ 292968 | consumed samples: 31676416 | consumed tokens: 15402319872 | elapsed time per iteration (ms): 128539.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.698839E+00 | loss scale: 32768.0 | grad norm: 23113.048 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.88 | iteration 15468/ 292968 | consumed samples: 31678464 | consumed tokens: 15404187648 | elapsed time per iteration (ms): 128713.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.693099E+00 | loss scale: 32768.0 | grad norm: 19777.964 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.75 | iteration 15469/ 292968 | consumed samples: 31680512 | consumed tokens: 15406055424 | elapsed time per iteration (ms): 128016.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.733178E+00 | loss scale: 32768.0 | grad norm: 13954.450 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.26 | iteration 15470/ 292968 | consumed samples: 31682560 | consumed tokens: 15407923200 | elapsed time per iteration (ms): 126117.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.703835E+00 | loss scale: 32768.0 | grad norm: 14087.483 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.68 | iteration 15471/ 292968 | consumed samples: 31684608 | consumed tokens: 15409790976 | elapsed time per iteration (ms): 126349.0 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.691762E+00 | loss scale: 32768.0 | grad norm: 17607.342 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.51 | iteration 15472/ 292968 | consumed samples: 31686656 | consumed tokens: 15411658752 | elapsed time per iteration (ms): 128478.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.699816E+00 | loss scale: 32768.0 | grad norm: 20342.292 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.92 | iteration 15473/ 292968 | consumed samples: 31688704 | consumed tokens: 15413526528 | elapsed time per iteration (ms): 124834.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.712617E+00 | loss scale: 32768.0 | grad norm: 29069.566 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.66 | iteration 15474/ 292968 | consumed samples: 31690752 | consumed tokens: 15415394304 | elapsed time per iteration (ms): 125307.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.680959E+00 | loss scale: 32768.0 | grad norm: 36598.988 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.30 | iteration 15475/ 292968 | consumed samples: 31692800 | consumed tokens: 15417262080 | elapsed time per iteration (ms): 125607.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.692810E+00 | loss scale: 32768.0 | grad norm: 27445.513 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.07 | iteration 15476/ 292968 | consumed samples: 31694848 | consumed tokens: 15419129856 | elapsed time per iteration (ms): 124534.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.718692E+00 | loss scale: 32768.0 | grad norm: 28269.659 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.90 | iteration 15477/ 292968 | consumed samples: 31696896 | consumed tokens: 15420997632 | elapsed time per iteration (ms): 123736.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.714819E+00 | loss scale: 32768.0 | grad norm: 32893.319 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.52 | iteration 15478/ 292968 | consumed samples: 31698944 | consumed tokens: 15422865408 | elapsed time per iteration (ms): 124561.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.702713E+00 | loss scale: 32768.0 | grad norm: 38869.454 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.88 | iteration 15479/ 292968 | consumed samples: 31700992 | consumed tokens: 15424733184 | elapsed time per iteration (ms): 124161.6 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.717513E+00 | loss scale: 32768.0 | grad norm: 26955.500 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.19 | iteration 15480/ 292968 | consumed samples: 31703040 | consumed tokens: 15426600960 | elapsed time per iteration (ms): 123802.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.684467E+00 | loss scale: 32768.0 | grad norm: 23603.840 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.47 | iteration 15481/ 292968 | consumed samples: 31705088 | consumed tokens: 15428468736 | elapsed time per iteration (ms): 124134.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.694800E+00 | loss scale: 32768.0 | grad norm: 31411.980 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.21 | iteration 15482/ 292968 | consumed samples: 31707136 | consumed tokens: 15430336512 | elapsed time per iteration (ms): 123099.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.721876E+00 | loss scale: 32768.0 | grad norm: 36140.809 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.03 | iteration 15483/ 292968 | consumed samples: 31709184 | consumed tokens: 15432204288 | elapsed time per iteration (ms): 123727.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.713735E+00 | loss scale: 32768.0 | grad norm: 32295.171 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.53 | iteration 15484/ 292968 | consumed samples: 31711232 | consumed tokens: 15434072064 | elapsed time per iteration (ms): 124039.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.724758E+00 | loss scale: 32768.0 | grad norm: 30100.671 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.28 | iteration 15485/ 292968 | consumed samples: 31713280 | consumed tokens: 15435939840 | elapsed time per iteration (ms): 123678.6 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.709995E+00 | loss scale: 32768.0 | grad norm: 30892.352 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.57 | iteration 15486/ 292968 | consumed samples: 31715328 | consumed tokens: 15437807616 | elapsed time per iteration (ms): 124815.6 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.713182E+00 | loss scale: 32768.0 | grad norm: 39666.758 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.68 | iteration 15487/ 292968 | consumed samples: 31717376 | consumed tokens: 15439675392 | elapsed time per iteration (ms): 123329.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.710817E+00 | loss scale: 32768.0 | grad norm: 27198.686 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.84 | iteration 15488/ 292968 | consumed samples: 31719424 | consumed tokens: 15441543168 | elapsed time per iteration (ms): 123336.6 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.695730E+00 | loss scale: 32768.0 | grad norm: 27805.873 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.84 | iteration 15489/ 292968 | consumed samples: 31721472 | consumed tokens: 15443410944 | elapsed time per iteration (ms): 124660.8 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.728303E+00 | loss scale: 32768.0 | grad norm: 31392.018 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.80 | iteration 15490/ 292968 | consumed samples: 31723520 | consumed tokens: 15445278720 | elapsed time per iteration (ms): 122878.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.716000E+00 | loss scale: 32768.0 | grad norm: 29383.345 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.20 | iteration 15491/ 292968 | consumed samples: 31725568 | consumed tokens: 15447146496 | elapsed time per iteration (ms): 122104.0 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.690871E+00 | loss scale: 32768.0 | grad norm: 24740.840 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.83 | iteration 15492/ 292968 | consumed samples: 31727616 | consumed tokens: 15449014272 | elapsed time per iteration (ms): 122236.1 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.699353E+00 | loss scale: 32768.0 | grad norm: 26378.130 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.72 | iteration 15493/ 292968 | consumed samples: 31729664 | consumed tokens: 15450882048 | elapsed time per iteration (ms): 124487.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.739936E+00 | loss scale: 32768.0 | grad norm: 28730.821 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.93 | iteration 15494/ 292968 | consumed samples: 31731712 | consumed tokens: 15452749824 | elapsed time per iteration (ms): 124501.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.722945E+00 | loss scale: 32768.0 | grad norm: 26104.362 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.92 | iteration 15495/ 292968 | consumed samples: 31733760 | consumed tokens: 15454617600 | elapsed time per iteration (ms): 122969.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.706856E+00 | loss scale: 32768.0 | grad norm: 20038.973 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.13 | iteration 15496/ 292968 | consumed samples: 31735808 | consumed tokens: 15456485376 | elapsed time per iteration (ms): 123455.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.707979E+00 | loss scale: 32768.0 | grad norm: 21168.662 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.74 | iteration 15497/ 292968 | consumed samples: 31737856 | consumed tokens: 15458353152 | elapsed time per iteration (ms): 122375.8 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.722990E+00 | loss scale: 32768.0 | grad norm: 28759.730 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.61 | iteration 15498/ 292968 | consumed samples: 31739904 | consumed tokens: 15460220928 | elapsed time per iteration (ms): 124207.0 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.729968E+00 | loss scale: 32768.0 | grad norm: 30859.860 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.15 | iteration 15499/ 292968 | consumed samples: 31741952 | consumed tokens: 15462088704 | elapsed time per iteration (ms): 123326.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.703834E+00 | loss scale: 32768.0 | grad norm: 28810.440 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.85 | iteration 15500/ 292968 | consumed samples: 31744000 | consumed tokens: 15463956480 | elapsed time per iteration (ms): 125173.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.712369E+00 | loss scale: 32768.0 | grad norm: 22595.890 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.40 | saving checkpoint at iteration 15500 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 13:00:41,067] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/mp_rank_00_model_states.pt [2022-01-27 13:00:41,472] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/mp_rank_01_model_states.pt [2022-01-27 13:00:54,703] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 13:00:55,905] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 13:00:56,166] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 13:00:56,415] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 13:00:57,369] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 13:00:57,670] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 13:00:57,834] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 13:00:57,860] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 13:01:02,974] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 13:01:03,328] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 13:01:03,570] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 13:01:03,942] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 13:01:04,174] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 13:01:04,213] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 13:01:04,342] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 13:01:04,652] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 13:01:04,862] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 13:01:04,849] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 13:01:05,236] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 13:01:05,243] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 13:01:05,416] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 13:01:05,431] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 13:01:05,463] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 13:01:05,523] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 13:01:05,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 13:01:05,605] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 13:01:06,081] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 13:01:06,147] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 13:01:06,281] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 13:01:06,346] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 13:01:06,866] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 13:01:07,586] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 13:01:07,771] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 13:01:08,003] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 13:01:08,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 13:01:08,310] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 13:01:08,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 13:01:08,578] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 13:01:08,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 13:01:08,742] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 13:01:09,042] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 13:01:09,233] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 13:01:09,249] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 13:01:09,332] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 13:01:09,367] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 13:01:09,503] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 13:01:09,591] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 13:01:09,656] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 13:01:09,849] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 13:01:10,330] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 13:01:10,509] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 13:01:10,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 13:01:10,999] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 13:01:11,080] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 13:01:11,187] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 13:01:11,411] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 13:01:11,438] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 13:01:11,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 13:01:11,649] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 13:01:11,721] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 13:01:12,061] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 13:01:12,111] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 13:01:12,169] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 13:01:12,374] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 13:01:12,357] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 13:01:12,652] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 13:01:12,744] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 13:01:12,928] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 13:01:12,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 13:01:12,992] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 13:01:12,994] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 13:01:12,995] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 13:01:13,069] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 13:01:13,074] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 13:01:13,097] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 13:01:13,095] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 13:01:13,145] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 13:01:13,227] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 13:01:13,611] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 13:01:13,716] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 13:01:13,718] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 13:01:13,900] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 13:01:14,051] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 13:01:14,078] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 13:01:14,180] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 13:01:14,247] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 13:01:14,249] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 13:01:14,315] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 13:01:15,211] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 13:01:15,260] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 13:01:15,529] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 13:01:15,567] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 13:01:15,589] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 13:01:15,713] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 13:01:15,781] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 13:01:15,885] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 13:01:15,938] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 13:01:15,943] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 13:01:16,013] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 13:01:16,858] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 13:01:17,213] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 13:01:17,737] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 13:01:18,209] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 13:01:18,578] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 13:01:18,688] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 13:01:18,697] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 13:01:18,708] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 13:01:18,807] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 13:01:18,851] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 13:01:18,856] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 13:01:18,968] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 13:01:18,974] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 13:01:19,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 13:01:19,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 13:01:19,472] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 13:01:20,436] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 13:01:20,454] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 13:01:20,460] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 13:01:20,516] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 13:01:20,594] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 13:01:20,676] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 13:01:20,738] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 13:01:20,760] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 13:01:20,835] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 13:01:20,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 13:01:20,896] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 13:01:21,569] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 13:01:21,637] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15500/zero_pp_rank_0_mp_rank_21_optim_states.pt successfully saved checkpoint at iteration 15500 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 44414.07 iteration 15501/ 292968 | consumed samples: 31746048 | consumed tokens: 15465824256 | elapsed time per iteration (ms): 166972.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.747991E+00 | loss scale: 32768.0 | grad norm: 18585.672 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 72.27 | iteration 15502/ 292968 | consumed samples: 31748096 | consumed tokens: 15467692032 | elapsed time per iteration (ms): 126535.1 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.748973E+00 | loss scale: 32768.0 | grad norm: 21968.785 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.37 | iteration 15503/ 292968 | consumed samples: 31750144 | consumed tokens: 15469559808 | elapsed time per iteration (ms): 123437.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.716420E+00 | loss scale: 32768.0 | grad norm: 18983.977 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.76 | iteration 15504/ 292968 | consumed samples: 31752192 | consumed tokens: 15471427584 | elapsed time per iteration (ms): 124545.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.747393E+00 | loss scale: 32768.0 | grad norm: 23430.161 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.89 | iteration 15505/ 292968 | consumed samples: 31754240 | consumed tokens: 15473295360 | elapsed time per iteration (ms): 124551.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.729638E+00 | loss scale: 32768.0 | grad norm: 21922.473 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.88 | iteration 15506/ 292968 | consumed samples: 31756288 | consumed tokens: 15475163136 | elapsed time per iteration (ms): 124046.6 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.707652E+00 | loss scale: 32768.0 | grad norm: 21108.776 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.28 | iteration 15507/ 292968 | consumed samples: 31758336 | consumed tokens: 15477030912 | elapsed time per iteration (ms): 124598.8 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.713505E+00 | loss scale: 32768.0 | grad norm: 17965.309 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.85 | iteration 15508/ 292968 | consumed samples: 31760384 | consumed tokens: 15478898688 | elapsed time per iteration (ms): 123671.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.744891E+00 | loss scale: 32768.0 | grad norm: 19899.260 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.57 | iteration 15509/ 292968 | consumed samples: 31762432 | consumed tokens: 15480766464 | elapsed time per iteration (ms): 123984.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.714315E+00 | loss scale: 32768.0 | grad norm: 16275.607 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.33 | iteration 15510/ 292968 | consumed samples: 31764480 | consumed tokens: 15482634240 | elapsed time per iteration (ms): 123465.6 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.716850E+00 | loss scale: 32768.0 | grad norm: 13684.808 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.74 | iteration 15511/ 292968 | consumed samples: 31766528 | consumed tokens: 15484502016 | elapsed time per iteration (ms): 123294.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.730458E+00 | loss scale: 32768.0 | grad norm: 15743.864 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.87 | iteration 15512/ 292968 | consumed samples: 31768576 | consumed tokens: 15486369792 | elapsed time per iteration (ms): 123462.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.703342E+00 | loss scale: 32768.0 | grad norm: 16907.486 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.74 | iteration 15513/ 292968 | consumed samples: 31770624 | consumed tokens: 15488237568 | elapsed time per iteration (ms): 123939.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.712699E+00 | loss scale: 32768.0 | grad norm: 16057.724 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.36 | iteration 15514/ 292968 | consumed samples: 31772672 | consumed tokens: 15490105344 | elapsed time per iteration (ms): 123473.0 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.693842E+00 | loss scale: 32768.0 | grad norm: 21733.541 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.73 | iteration 15515/ 292968 | consumed samples: 31774720 | consumed tokens: 15491973120 | elapsed time per iteration (ms): 124382.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.721943E+00 | loss scale: 32768.0 | grad norm: 24154.878 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.02 | iteration 15516/ 292968 | consumed samples: 31776768 | consumed tokens: 15493840896 | elapsed time per iteration (ms): 124307.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.702196E+00 | loss scale: 32768.0 | grad norm: 32170.921 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.07 | iteration 15517/ 292968 | consumed samples: 31778816 | consumed tokens: 15495708672 | elapsed time per iteration (ms): 125908.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.731901E+00 | loss scale: 32768.0 | grad norm: 42707.845 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.84 | iteration 15518/ 292968 | consumed samples: 31780864 | consumed tokens: 15497576448 | elapsed time per iteration (ms): 124986.0 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.686913E+00 | loss scale: 32768.0 | grad norm: 24795.581 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.55 | iteration 15519/ 292968 | consumed samples: 31782912 | consumed tokens: 15499444224 | elapsed time per iteration (ms): 127074.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.713937E+00 | loss scale: 32768.0 | grad norm: 22147.976 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.96 | iteration 15520/ 292968 | consumed samples: 31784960 | consumed tokens: 15501312000 | elapsed time per iteration (ms): 125080.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.695007E+00 | loss scale: 32768.0 | grad norm: 31537.178 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.47 | iteration 15521/ 292968 | consumed samples: 31787008 | consumed tokens: 15503179776 | elapsed time per iteration (ms): 124692.6 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.709055E+00 | loss scale: 32768.0 | grad norm: 36692.349 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.77 | iteration 15522/ 292968 | consumed samples: 31789056 | consumed tokens: 15505047552 | elapsed time per iteration (ms): 125894.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.709615E+00 | loss scale: 32768.0 | grad norm: 22157.746 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.85 | iteration 15523/ 292968 | consumed samples: 31791104 | consumed tokens: 15506915328 | elapsed time per iteration (ms): 124135.1 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.694861E+00 | loss scale: 32768.0 | grad norm: 27804.394 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.21 | iteration 15524/ 292968 | consumed samples: 31793152 | consumed tokens: 15508783104 | elapsed time per iteration (ms): 126259.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.716807E+00 | loss scale: 32768.0 | grad norm: 32619.947 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.57 | iteration 15525/ 292968 | consumed samples: 31795200 | consumed tokens: 15510650880 | elapsed time per iteration (ms): 123888.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.681340E+00 | loss scale: 32768.0 | grad norm: 26498.436 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.40 | iteration 15526/ 292968 | consumed samples: 31797248 | consumed tokens: 15512518656 | elapsed time per iteration (ms): 125486.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.697082E+00 | loss scale: 32768.0 | grad norm: 24851.431 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.16 | iteration 15527/ 292968 | consumed samples: 31799296 | consumed tokens: 15514386432 | elapsed time per iteration (ms): 126587.5 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.722326E+00 | loss scale: 32768.0 | grad norm: 28423.206 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.33 | iteration 15528/ 292968 | consumed samples: 31801344 | consumed tokens: 15516254208 | elapsed time per iteration (ms): 124539.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.699316E+00 | loss scale: 32768.0 | grad norm: 26007.595 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.89 | iteration 15529/ 292968 | consumed samples: 31803392 | consumed tokens: 15518121984 | elapsed time per iteration (ms): 123497.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.728806E+00 | loss scale: 32768.0 | grad norm: 21639.714 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 97.71 | iteration 15530/ 292968 | consumed samples: 31805440 | consumed tokens: 15519989760 | elapsed time per iteration (ms): 124844.0 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.744741E+00 | loss scale: 32768.0 | grad norm: 23293.565 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.66 | iteration 15531/ 292968 | consumed samples: 31807488 | consumed tokens: 15521857536 | elapsed time per iteration (ms): 124789.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.695158E+00 | loss scale: 32768.0 | grad norm: 26848.830 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.70 | iteration 15532/ 292968 | consumed samples: 31809536 | consumed tokens: 15523725312 | elapsed time per iteration (ms): 123103.4 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.692224E+00 | loss scale: 32768.0 | grad norm: 22317.575 | num zeros: 0.0 | curriculum seqlen: 912 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.02 | iteration 15533/ 292968 | consumed samples: 31811584 | consumed tokens: 15525609472 | elapsed time per iteration (ms): 125185.1 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.718478E+00 | loss scale: 32768.0 | grad norm: 27392.815 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.24 | iteration 15534/ 292968 | consumed samples: 31813632 | consumed tokens: 15527493632 | elapsed time per iteration (ms): 126201.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.705431E+00 | loss scale: 32768.0 | grad norm: 23474.009 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.46 | iteration 15535/ 292968 | consumed samples: 31815680 | consumed tokens: 15529377792 | elapsed time per iteration (ms): 124039.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.699100E+00 | loss scale: 32768.0 | grad norm: 16327.807 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.14 | iteration 15536/ 292968 | consumed samples: 31817728 | consumed tokens: 15531261952 | elapsed time per iteration (ms): 124463.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.696484E+00 | loss scale: 32768.0 | grad norm: 21515.951 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.80 | iteration 15537/ 292968 | consumed samples: 31819776 | consumed tokens: 15533146112 | elapsed time per iteration (ms): 124874.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.742755E+00 | loss scale: 32768.0 | grad norm: 24526.295 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.48 | iteration 15538/ 292968 | consumed samples: 31821824 | consumed tokens: 15535030272 | elapsed time per iteration (ms): 123235.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.716711E+00 | loss scale: 32768.0 | grad norm: 32282.225 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.78 | iteration 15539/ 292968 | consumed samples: 31823872 | consumed tokens: 15536914432 | elapsed time per iteration (ms): 124740.7 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.727600E+00 | loss scale: 32768.0 | grad norm: 32590.057 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.59 | iteration 15540/ 292968 | consumed samples: 31825920 | consumed tokens: 15538798592 | elapsed time per iteration (ms): 123691.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.724909E+00 | loss scale: 32768.0 | grad norm: 20951.263 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.41 | iteration 15541/ 292968 | consumed samples: 31827968 | consumed tokens: 15540682752 | elapsed time per iteration (ms): 130145.1 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.713869E+00 | loss scale: 32768.0 | grad norm: 27207.811 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.53 | iteration 15542/ 292968 | consumed samples: 31830016 | consumed tokens: 15542566912 | elapsed time per iteration (ms): 136560.9 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.697396E+00 | loss scale: 32768.0 | grad norm: 34785.172 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.14 | iteration 15543/ 292968 | consumed samples: 31832064 | consumed tokens: 15544451072 | elapsed time per iteration (ms): 127379.3 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.763609E+00 | loss scale: 32768.0 | grad norm: 29808.577 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.56 | iteration 15544/ 292968 | consumed samples: 31834112 | consumed tokens: 15546335232 | elapsed time per iteration (ms): 123726.8 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.736038E+00 | loss scale: 32768.0 | grad norm: 25728.193 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.39 | iteration 15545/ 292968 | consumed samples: 31836160 | consumed tokens: 15548219392 | elapsed time per iteration (ms): 124275.2 | learning rate: 5.953E-05 | global batch size: 2048 | lm loss: 2.749746E+00 | loss scale: 32768.0 | grad norm: 27881.314 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.95 | iteration 15546/ 292968 | consumed samples: 31838208 | consumed tokens: 15550103552 | elapsed time per iteration (ms): 122880.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.769890E+00 | loss scale: 32768.0 | grad norm: 35599.014 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.06 | iteration 15547/ 292968 | consumed samples: 31840256 | consumed tokens: 15551987712 | elapsed time per iteration (ms): 125016.2 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.760728E+00 | loss scale: 32768.0 | grad norm: 19414.622 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.37 | iteration 15548/ 292968 | consumed samples: 31842304 | consumed tokens: 15553871872 | elapsed time per iteration (ms): 124093.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.721499E+00 | loss scale: 32768.0 | grad norm: 22403.177 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.10 | iteration 15549/ 292968 | consumed samples: 31844352 | consumed tokens: 15555756032 | elapsed time per iteration (ms): 123946.7 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.745769E+00 | loss scale: 32768.0 | grad norm: 22165.771 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.21 | iteration 15550/ 292968 | consumed samples: 31846400 | consumed tokens: 15557640192 | elapsed time per iteration (ms): 124384.3 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.740939E+00 | loss scale: 32768.0 | grad norm: 22823.282 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.87 | saving checkpoint at iteration 15550 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 14:45:29,257] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/mp_rank_00_model_states.pt [2022-01-27 14:45:30,284] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/mp_rank_01_model_states.pt [2022-01-27 14:45:43,155] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 14:45:43,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 14:45:45,341] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 14:45:45,356] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 14:45:45,501] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 14:45:45,503] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 14:45:45,524] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 14:45:45,597] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 14:45:51,136] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 14:45:53,010] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 14:45:53,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 14:45:53,692] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 14:45:53,809] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 14:45:53,804] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 14:45:54,329] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 14:45:54,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 14:45:54,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 14:45:55,155] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 14:45:55,328] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 14:45:55,440] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 14:45:55,428] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 14:45:55,533] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 14:45:55,565] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 14:45:55,588] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 14:45:55,808] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 14:45:55,821] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 14:45:56,880] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 14:45:56,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 14:45:56,947] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 14:45:57,043] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 14:45:57,066] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 14:45:57,122] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 14:45:57,166] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 14:45:57,316] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 14:45:57,363] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 14:45:57,380] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 14:45:57,401] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 14:45:57,531] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 14:45:57,577] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 14:45:57,605] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 14:45:57,659] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 14:45:57,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 14:45:57,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 14:45:58,073] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 14:45:58,093] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 14:45:58,064] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 14:45:58,157] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 14:45:58,365] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 14:45:58,425] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 14:45:58,445] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 14:45:58,534] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 14:45:59,097] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 14:45:59,230] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 14:45:59,353] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 14:45:59,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 14:45:59,690] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 14:45:59,756] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 14:45:59,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 14:46:00,013] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 14:46:00,072] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 14:46:00,120] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 14:46:00,186] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 14:46:00,237] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 14:46:00,374] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 14:46:00,691] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 14:46:00,789] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 14:46:01,095] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 14:46:01,123] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 14:46:01,373] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 14:46:01,541] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 14:46:02,041] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 14:46:02,781] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 14:46:03,046] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 14:46:03,127] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 14:46:03,160] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 14:46:03,165] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 14:46:03,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 14:46:03,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 14:46:03,272] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 14:46:03,294] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 14:46:03,640] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 14:46:03,976] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 14:46:04,216] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 14:46:04,301] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 14:46:04,503] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 14:46:04,505] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 14:46:04,634] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 14:46:04,519] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 14:46:04,544] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 14:46:04,741] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 14:46:04,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 14:46:04,963] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 14:46:04,977] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 14:46:05,040] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 14:46:05,019] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 14:46:05,055] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 14:46:05,031] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 14:46:05,317] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 14:46:05,433] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 14:46:05,445] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 14:46:05,760] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 14:46:05,771] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 14:46:05,787] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 14:46:05,847] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 14:46:06,220] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 14:46:06,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 14:46:06,384] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 14:46:06,401] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 14:46:06,500] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 14:46:06,725] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 14:46:06,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 14:46:06,874] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 14:46:06,975] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 14:46:07,480] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 14:46:07,577] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 14:46:07,627] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 14:46:07,668] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 14:46:08,280] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 14:46:08,289] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 14:46:08,899] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 14:46:08,929] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 14:46:08,984] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 14:46:09,031] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 14:46:09,046] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 14:46:09,155] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 14:46:09,194] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 14:46:10,746] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 14:46:10,796] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15550/zero_pp_rank_0_mp_rank_02_optim_states.pt successfully saved checkpoint at iteration 15550 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 45295.92 iteration 15551/ 292968 | consumed samples: 31848448 | consumed tokens: 15559524352 | elapsed time per iteration (ms): 171458.6 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.736945E+00 | loss scale: 32768.0 | grad norm: 17684.145 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 71.00 | iteration 15552/ 292968 | consumed samples: 31850496 | consumed tokens: 15561408512 | elapsed time per iteration (ms): 125472.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.728887E+00 | loss scale: 32768.0 | grad norm: 19130.260 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.02 | iteration 15553/ 292968 | consumed samples: 31852544 | consumed tokens: 15563292672 | elapsed time per iteration (ms): 124321.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.715191E+00 | loss scale: 32768.0 | grad norm: 14321.261 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.91 | iteration 15554/ 292968 | consumed samples: 31854592 | consumed tokens: 15565176832 | elapsed time per iteration (ms): 129917.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.715913E+00 | loss scale: 32768.0 | grad norm: 18105.010 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.70 | saving checkpoint at iteration 15554 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 14:54:40,985] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/mp_rank_01_model_states.pt [2022-01-27 14:54:41,060] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/mp_rank_00_model_states.pt [2022-01-27 14:54:52,256] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 14:54:53,835] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 14:54:54,405] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 14:54:55,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 14:54:55,901] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 14:54:56,025] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 14:54:56,095] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 14:54:56,115] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 14:55:00,863] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 14:55:02,015] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 14:55:02,040] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 14:55:03,142] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 14:55:03,408] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 14:55:03,566] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 14:55:03,557] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 14:55:03,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 14:55:03,702] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 14:55:04,251] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 14:55:04,317] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 14:55:04,365] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 14:55:04,505] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 14:55:04,520] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 14:55:04,610] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 14:55:04,642] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 14:55:04,726] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 14:55:04,779] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 14:55:04,911] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 14:55:04,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 14:55:05,013] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 14:55:05,193] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 14:55:06,622] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 14:55:06,645] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 14:55:06,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 14:55:06,835] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 14:55:07,186] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 14:55:07,344] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 14:55:07,495] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 14:55:07,720] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 14:55:07,838] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 14:55:07,938] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 14:55:07,968] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 14:55:07,968] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 14:55:08,042] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 14:55:08,300] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 14:55:08,330] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 14:55:08,440] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 14:55:08,678] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 14:55:08,780] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 14:55:09,006] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 14:55:09,025] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 14:55:09,066] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 14:55:09,067] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 14:55:09,126] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 14:55:09,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 14:55:09,333] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 14:55:09,502] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 14:55:09,486] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 14:55:09,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 14:55:10,009] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 14:55:10,026] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 14:55:10,070] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 14:55:10,310] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 14:55:10,331] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 14:55:10,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 14:55:10,532] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 14:55:10,516] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 14:55:10,471] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 14:55:10,581] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 14:55:10,665] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 14:55:10,713] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 14:55:10,742] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 14:55:10,793] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 14:55:10,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 14:55:10,890] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 14:55:10,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 14:55:10,903] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 14:55:11,189] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 14:55:11,215] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 14:55:11,307] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 14:55:11,353] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 14:55:11,567] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 14:55:11,588] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 14:55:11,618] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 14:55:11,653] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 14:55:11,677] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 14:55:11,713] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 14:55:11,722] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 14:55:11,733] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 14:55:11,770] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 14:55:11,791] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 14:55:11,941] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 14:55:11,948] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 14:55:12,083] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 14:55:12,117] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 14:55:12,284] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 14:55:12,337] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 14:55:12,364] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 14:55:12,414] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 14:55:12,573] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 14:55:12,577] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 14:55:12,701] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 14:55:12,745] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 14:55:12,830] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 14:55:12,846] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 14:55:13,077] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 14:55:13,262] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 14:55:13,312] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 14:55:13,822] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 14:55:13,930] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 14:55:14,484] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 14:55:14,505] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 14:55:14,541] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 14:55:14,669] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 14:55:14,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 14:55:15,075] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 14:55:15,928] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 14:55:15,961] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 14:55:16,420] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 14:55:17,881] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 14:55:17,989] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 14:55:19,302] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 14:55:19,552] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 14:55:19,997] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 14:55:20,044] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 14:55:20,987] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 14:55:21,234] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 14:55:21,304] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 14:55:21,416] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15554/zero_pp_rank_0_mp_rank_27_optim_states.pt successfully saved checkpoint at iteration 15554 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 44744.88 [exiting program after 1186.611139690876 minutes] datetime: 2022-01-27 14:55:21 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1557042.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- > setting tensorboard ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+24fe7002, 24fe7002, elastic-ckpt-refresh deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-01-27 17:38:21,898] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.167 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 13.280 seconds time to initialize megatron (seconds): 11.347 [after megatron is initialized] datetime: 2022-01-27 17:38:35 building GPT model ... [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,348] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,349] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-01-27 17:38:35,388] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-01-27 17:38:35,388] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-01-27 17:38:35,388] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.65 GB, percent = 7.9% [2022-01-27 17:38:35,389] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-01-27 17:38:37,062] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-01-27 17:38:37,769] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-01-27 17:38:37,769] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-01-27 17:38:37,769] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 40.12 GB, percent = 8.0% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-01-27 17:38:37,866] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+24fe7002, git-hash=24fe7002, git-branch=elastic-ckpt-refresh [2022-01-27 17:38:38,489] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-01-27 17:38:38,489] [INFO] [engine.py:1093:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-01-27 17:38:38,489] [INFO] [engine.py:1099:_configure_optimizer] Using client Optimizer as basic optimizer [2022-01-27 17:38:38,490] [INFO] [engine.py:1115:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-01-27 17:38:38,490] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-01-27 17:38:38,490] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-01-27 17:38:38,490] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-01-27 17:38:38,490] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-01-27 17:38:38,490] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-01-27 17:38:38,490] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] [2022-01-27 17:38:43,382] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-01-27 17:38:43,382] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-01-27 17:38:43,382] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.88 GB, percent = 7.9% [2022-01-27 17:38:43,452] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-01-27 17:38:43,453] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-01-27 17:38:43,453] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.88 GB, percent = 7.9% [2022-01-27 17:38:43,453] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-01-27 17:38:43,474] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-01-27 17:38:43,474] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-01-27 17:38:43,474] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.88 GB, percent = 7.9% [2022-01-27 17:38:43,474] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-01-27 17:38:43,474] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-01-27 17:38:43,474] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-01-27 17:38:43,474] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-01-27 17:38:43,475] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] amp_params ................... False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] dump_state ................... False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-01-27 17:38:43,475] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] pld_params ................... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] world_size ................... 1 [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-01-27 17:38:43,476] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-01-27 17:38:43,476] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-01-27 17:38:43,476] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-01-27 17:38:45,786] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-01-27 17:39:07,329] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-01-27 17:39:08,376] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-01-27 17:39:09,204] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-01-27 17:39:09,978] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-01-27 17:39:10,191] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-01-27 17:39:10,499] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-01-27 17:39:10,552] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-01-27 17:39:10,661] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-01-27 17:39:10,814] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-01-27 17:39:10,906] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-01-27 17:39:11,159] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-01-27 17:39:11,378] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-01-27 17:39:11,570] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-01-27 17:39:11,617] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-01-27 17:39:11,920] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-01-27 17:39:12,050] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-01-27 17:39:12,116] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-01-27 17:39:12,552] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-01-27 17:39:12,649] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-01-27 17:39:12,709] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-01-27 17:39:12,730] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-01-27 17:39:12,732] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-01-27 17:39:12,745] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-01-27 17:39:12,883] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-01-27 17:39:12,967] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-01-27 17:39:13,124] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-01-27 17:39:13,463] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-01-27 17:39:13,676] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-01-27 17:39:13,810] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-01-27 17:39:13,949] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-01-27 17:39:13,968] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-01-27 17:39:13,985] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-01-27 17:39:14,115] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-01-27 17:39:14,175] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-01-27 17:39:14,217] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-01-27 17:39:14,255] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-01-27 17:39:14,258] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-01-27 17:39:14,269] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-01-27 17:39:14,312] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-01-27 17:39:14,325] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-01-27 17:39:14,441] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-01-27 17:39:14,607] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-01-27 17:39:14,683] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-01-27 17:39:14,842] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-01-27 17:39:15,118] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-01-27 17:39:15,222] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-01-27 17:39:15,249] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-01-27 17:39:15,365] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-01-27 17:39:15,421] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-01-27 17:39:15,607] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-01-27 17:39:15,635] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-01-27 17:39:15,645] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-01-27 17:39:15,650] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-01-27 17:39:15,665] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-01-27 17:39:15,678] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-01-27 17:39:15,775] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-01-27 17:39:15,835] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-01-27 17:39:15,954] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-01-27 17:39:16,088] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-01-27 17:39:16,117] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-01-27 17:39:16,199] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-01-27 17:39:16,231] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-01-27 17:39:16,337] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-01-27 17:39:16,397] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-01-27 17:39:16,443] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-01-27 17:39:16,521] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-01-27 17:39:16,523] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-01-27 17:39:16,695] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-01-27 17:39:16,751] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-01-27 17:39:16,880] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-01-27 17:39:16,915] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-01-27 17:39:17,017] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-01-27 17:39:17,052] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-01-27 17:39:17,068] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-01-27 17:39:17,071] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-01-27 17:39:17,098] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-01-27 17:39:17,114] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-01-27 17:39:17,181] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-01-27 17:39:17,222] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-01-27 17:39:17,237] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-01-27 17:39:17,251] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-01-27 17:39:17,324] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-01-27 17:39:17,374] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-01-27 17:39:17,472] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-01-27 17:39:17,575] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-01-27 17:39:17,637] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-01-27 17:39:17,676] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-01-27 17:39:17,689] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-01-27 17:39:17,755] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-01-27 17:39:17,794] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-01-27 17:39:17,797] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-01-27 17:39:17,834] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-01-27 17:39:17,866] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-01-27 17:39:17,984] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-01-27 17:39:18,054] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-01-27 17:39:18,068] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-01-27 17:39:18,078] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-01-27 17:39:18,139] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-01-27 17:39:18,147] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-01-27 17:39:18,269] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-01-27 17:39:18,350] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-01-27 17:39:18,438] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-01-27 17:39:18,527] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-01-27 17:39:18,533] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-01-27 17:39:18,544] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-01-27 17:39:18,560] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-01-27 17:39:18,577] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-01-27 17:39:18,631] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-01-27 17:39:18,655] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-01-27 17:39:18,673] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-01-27 17:39:18,687] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-01-27 17:39:18,774] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-01-27 17:39:18,886] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-01-27 17:39:18,919] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-01-27 17:39:18,933] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-01-27 17:39:18,933] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-01-27 17:39:18,935] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-01-27 17:39:19,021] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-01-27 17:39:19,022] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-01-27 17:39:19,099] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-01-27 17:39:19,164] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-01-27 17:39:19,207] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-01-27 17:39:19,209] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-01-27 17:39:19,286] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-01-27 17:39:19,326] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-01-27 17:39:19,331] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-01-27 17:39:19,470] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-01-27 17:39:19,478] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-01-27 17:39:19,482] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-01-27 17:39:19,531] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-01-27 17:39:19,551] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-01-27 17:39:19,596] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-01-27 17:39:19,657] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-01-27 17:39:19,674] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-01-27 17:39:19,726] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 [2022-01-27 17:39:19,751] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-01-27 17:39:19,762] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-01-27 17:39:19,775] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-01-27 17:39:19,831] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-01-27 17:39:19,838] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-01-27 17:39:20,025] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-01-27 17:39:20,076] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-01-27 17:39:20,076] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-01-27 17:39:20,116] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-01-27 17:39:20,130] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-01-27 17:39:20,249] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-01-27 17:39:20,359] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-01-27 17:39:20,399] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-01-27 17:39:20,405] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-01-27 17:39:20,418] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-01-27 17:39:20,432] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-01-27 17:39:20,477] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-01-27 17:39:20,525] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-01-27 17:39:20,532] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-01-27 17:39:20,560] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-01-27 17:39:20,620] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-01-27 17:39:20,625] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-01-27 17:39:20,641] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-01-27 17:39:20,677] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-01-27 17:39:20,686] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-01-27 17:39:20,710] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-01-27 17:39:20,812] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-01-27 17:39:20,840] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-01-27 17:39:20,891] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-01-27 17:39:20,934] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-01-27 17:39:20,973] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-01-27 17:39:21,014] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-01-27 17:39:21,026] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-01-27 17:39:21,151] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-01-27 17:39:21,214] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-01-27 17:39:21,227] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-01-27 17:39:21,292] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-01-27 17:39:21,352] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-01-27 17:39:21,386] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-01-27 17:39:21,398] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-01-27 17:39:21,493] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-01-27 17:39:21,535] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-01-27 17:39:21,555] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-01-27 17:39:21,561] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-01-27 17:39:21,672] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-01-27 17:39:21,682] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-01-27 17:39:21,684] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-01-27 17:39:21,687] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-01-27 17:39:21,713] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-01-27 17:39:21,749] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-01-27 17:39:21,779] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-01-27 17:39:21,839] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-01-27 17:39:21,949] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-01-27 17:39:21,954] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-01-27 17:39:21,996] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-01-27 17:39:22,015] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-01-27 17:39:22,021] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-01-27 17:39:22,028] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-01-27 17:39:22,055] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-01-27 17:39:22,058] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-01-27 17:39:22,177] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-01-27 17:39:22,238] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-01-27 17:39:22,246] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-01-27 17:39:22,333] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-01-27 17:39:22,417] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-01-27 17:39:22,444] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-01-27 17:39:22,469] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-01-27 17:39:22,476] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-01-27 17:39:22,480] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-01-27 17:39:22,496] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-01-27 17:39:22,525] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-01-27 17:39:22,542] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-01-27 17:39:22,544] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-01-27 17:39:22,567] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-01-27 17:39:22,626] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-01-27 17:39:22,633] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-01-27 17:39:22,656] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-01-27 17:39:22,719] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-01-27 17:39:22,758] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-01-27 17:39:22,779] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-01-27 17:39:22,840] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-01-27 17:39:22,954] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-01-27 17:39:22,969] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-01-27 17:39:22,972] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-01-27 17:39:22,997] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-01-27 17:39:23,049] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-01-27 17:39:23,083] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-01-27 17:39:23,091] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-01-27 17:39:23,093] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-01-27 17:39:23,095] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-01-27 17:39:23,228] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-01-27 17:39:23,239] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-01-27 17:39:23,242] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-01-27 17:39:23,257] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-01-27 17:39:23,357] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-01-27 17:39:23,423] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-01-27 17:39:23,424] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-01-27 17:39:23,499] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-01-27 17:39:23,536] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-01-27 17:39:23,557] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-01-27 17:39:23,627] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-01-27 17:39:23,713] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-01-27 17:39:23,750] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-01-27 17:39:23,843] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-01-27 17:39:23,859] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-01-27 17:39:23,870] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-01-27 17:39:23,911] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-01-27 17:39:23,946] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-01-27 17:39:23,956] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-01-27 17:39:24,039] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-01-27 17:39:24,230] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-01-27 17:39:24,367] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-01-27 17:39:24,428] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-01-27 17:39:24,452] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-01-27 17:39:24,556] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-01-27 17:39:24,620] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-01-27 17:39:24,644] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-01-27 17:39:24,790] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-01-27 17:39:24,898] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-01-27 17:39:24,942] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-01-27 17:39:25,117] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 15554 time (ms) | load-checkpoint: 38027.70 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-01-27 17:39:25 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.140483 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.130 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.156 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.072 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-01-27 17:39:33 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 49782.56 | train/valid/test-data-iterators-setup: 7374.83 [003-030] 103.3651B / 103.3651B [003-001] 103.3651B / 103.3651B[001-001] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B[001-004] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B[003-015] 103.3651B / 103.3651B [001-015] 103.3651B / 103.3651B [001-024] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B [001-022] 103.3651B / 103.3651B[003-023] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B[002-029] 103.3651B / 103.3651B [001-029] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B[003-012] 103.3651B / 103.3651B [002-013] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B [003-000] 125.2243B / 103.3681B [001-011] 103.3651B / 103.3651B[001-010] 103.3651B / 103.3651B[002-011] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B [002-005] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B[002-015] 103.3651B / 103.3651B [003-003] 103.3651B / 103.3651B[003-002] 103.3651B / 103.3651B[002-002] 103.3651B / 103.3651B [002-024] 103.3651B / 103.3651B[003-024] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B [003-017] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B[002-022] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [002-031] 125.2273B / 103.3710B [002-006] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B [003-028] 103.3651B / 103.3651B [001-013] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B[002-021] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B[002-000] 125.2243B / 103.3681B[002-001] 103.3651B / 103.3651B [002-008] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [003-010] 103.3651B / 103.3651B[003-011] 103.3651B / 103.3651B[002-010] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [001-003] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B [003-029] 103.3651B / 103.3651B [002-012] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B[003-027] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B[002-009] 103.3651B / 103.3651B[001-009] 103.3651B / 103.3651B [001-018] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-017] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B [002-007] 103.3651B / 103.3651B[003-006] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B[000-021] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-008] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B[000-031] 125.2273B / 103.3710B [000-006] 103.3651B / 103.3651B[000-007] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B[000-018] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-01-27 17:39:33 [2022-01-27 17:39:33,570] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-01-27 17:39:33,570] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-01-27 17:39:33,570] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-01-27 17:39:33,570] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-01-27 17:39:33,570] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 125] (after 15555 iterations) memory (MB) | allocated: 13245.3740234375 | max allocated: 20709.72265625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 9] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 11] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 13] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.8759765625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 25] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 10] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 124] (after 15555 iterations) memory (MB) | allocated: 13245.3740234375 | max allocated: 20709.72265625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 8] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 126] (after 15555 iterations) memory (MB) | allocated: 13245.3740234375 | max allocated: 20709.72265625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 49] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0[Rank 69] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0[Rank 57] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.8759765625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.8759765625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.58837890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 iteration 15555/ 292968 | consumed samples: 31856640 | consumed tokens: 15567060992 | elapsed time per iteration (ms): 253918.6 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.730904E+00 | loss scale: 32768.0 | grad norm: 18166.352 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.008 | TFLOPs: 47.94 | [Rank 127] (after 15555 iterations) memory (MB) | allocated: 13245.3740234375 | max allocated: 20709.03515625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 85] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 15555 iterations) memory (MB) | allocated: 10796.63134765625 | max allocated: 16956.81298828125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 7] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 15555 iterations) memory (MB) | allocated: 13207.05322265625 | max allocated: 20670.66943359375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 1] (after 15555 iterations) memory (MB) | allocated: 13207.05322265625 | max allocated: 20670.66943359375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 3] (after 15555 iterations) memory (MB) | allocated: 13208.978515625 | max allocated: 20672.5947265625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 6] (after 15555 iterations) memory (MB) | allocated: 10796.63134765625 | max allocated: 16956.81298828125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 0] (after 15555 iterations) memory (MB) | allocated: 13209.21044921875 | max allocated: 20672.82666015625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 5] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 4] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 15555 iterations) memory (MB) | allocated: 10797.20654296875 | max allocated: 16957.38818359375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.91015625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.8759765625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 15555 iterations) memory (MB) | allocated: 10797.20654296875 | max allocated: 16957.38818359375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 24] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 15555 iterations) memory (MB) | allocated: 10797.20654296875 | max allocated: 16957.38818359375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 20] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.8759765625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 35] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 15555 iterations) memory (MB) | allocated: 10796.63134765625 | max allocated: 16956.81298828125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 15555 iterations) memory (MB) | allocated: 10796.63134765625 | max allocated: 16956.81298828125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0[Rank 67] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16958.11181640625 | reserved: 20072.0 | max reserved: 20072.0[Rank 79] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.91015625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 123] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.49853515625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 66] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.8759765625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 58] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.8759765625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 86] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.91015625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 94] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 118] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.49853515625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 122] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16957.1767578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 15555 iterations) memory (MB) | allocated: 10797.20654296875 | max allocated: 16957.38818359375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.49853515625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.91015625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 15555 iterations) memory (MB) | allocated: 10797.31689453125 | max allocated: 16957.49853515625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 15555 iterations) memory (MB) | allocated: 10796.58349609375 | max allocated: 16956.76513671875 | reserved: 20072.0 | max reserved: 20072.0 iteration 15556/ 292968 | consumed samples: 31858688 | consumed tokens: 15568945152 | elapsed time per iteration (ms): 182008.2 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.712408E+00 | loss scale: 32768.0 | grad norm: 16690.497 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.011 | TFLOPs: 66.88 | iteration 15557/ 292968 | consumed samples: 31860736 | consumed tokens: 15570829312 | elapsed time per iteration (ms): 159093.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.699417E+00 | loss scale: 32768.0 | grad norm: 17521.618 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 76.51 | iteration 15558/ 292968 | consumed samples: 31862784 | consumed tokens: 15572713472 | elapsed time per iteration (ms): 149827.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709048E+00 | loss scale: 32768.0 | grad norm: 22689.557 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.25 | iteration 15559/ 292968 | consumed samples: 31864832 | consumed tokens: 15574597632 | elapsed time per iteration (ms): 154389.6 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.710851E+00 | loss scale: 32768.0 | grad norm: 28099.243 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 78.85 | iteration 15560/ 292968 | consumed samples: 31866880 | consumed tokens: 15576481792 | elapsed time per iteration (ms): 149817.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.714425E+00 | loss scale: 32768.0 | grad norm: 36519.065 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.25 | iteration 15561/ 292968 | consumed samples: 31868928 | consumed tokens: 15578365952 | elapsed time per iteration (ms): 142458.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.713197E+00 | loss scale: 32768.0 | grad norm: 32794.979 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.45 | iteration 15562/ 292968 | consumed samples: 31870976 | consumed tokens: 15580250112 | elapsed time per iteration (ms): 140292.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.720170E+00 | loss scale: 32768.0 | grad norm: 42789.011 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.77 | iteration 15563/ 292968 | consumed samples: 31873024 | consumed tokens: 15582134272 | elapsed time per iteration (ms): 148426.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.697225E+00 | loss scale: 32768.0 | grad norm: 23147.313 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.01 | iteration 15564/ 292968 | consumed samples: 31875072 | consumed tokens: 15584018432 | elapsed time per iteration (ms): 147975.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.713751E+00 | loss scale: 32768.0 | grad norm: 38293.777 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.26 | iteration 15565/ 292968 | consumed samples: 31877120 | consumed tokens: 15585902592 | elapsed time per iteration (ms): 141386.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.699571E+00 | loss scale: 32768.0 | grad norm: 35165.249 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.10 | iteration 15566/ 292968 | consumed samples: 31879168 | consumed tokens: 15587786752 | elapsed time per iteration (ms): 141072.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.723544E+00 | loss scale: 32768.0 | grad norm: 30718.414 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.29 | iteration 15567/ 292968 | consumed samples: 31881216 | consumed tokens: 15589670912 | elapsed time per iteration (ms): 144855.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.705065E+00 | loss scale: 32768.0 | grad norm: 34943.814 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.04 | iteration 15568/ 292968 | consumed samples: 31883264 | consumed tokens: 15591555072 | elapsed time per iteration (ms): 136781.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.705029E+00 | loss scale: 32768.0 | grad norm: 26144.530 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.00 | iteration 15569/ 292968 | consumed samples: 31885312 | consumed tokens: 15593439232 | elapsed time per iteration (ms): 135376.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.691973E+00 | loss scale: 32768.0 | grad norm: 30448.872 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.92 | iteration 15570/ 292968 | consumed samples: 31887360 | consumed tokens: 15595323392 | elapsed time per iteration (ms): 132666.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.696009E+00 | loss scale: 32768.0 | grad norm: 34359.967 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.76 | iteration 15571/ 292968 | consumed samples: 31889408 | consumed tokens: 15597207552 | elapsed time per iteration (ms): 135735.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.706168E+00 | loss scale: 32768.0 | grad norm: 23983.076 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.68 | iteration 15572/ 292968 | consumed samples: 31891456 | consumed tokens: 15599091712 | elapsed time per iteration (ms): 133966.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.693902E+00 | loss scale: 32768.0 | grad norm: 24657.322 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.87 | iteration 15573/ 292968 | consumed samples: 31893504 | consumed tokens: 15600975872 | elapsed time per iteration (ms): 133670.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.684850E+00 | loss scale: 32768.0 | grad norm: 26258.447 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.07 | iteration 15574/ 292968 | consumed samples: 31895552 | consumed tokens: 15602860032 | elapsed time per iteration (ms): 133588.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709208E+00 | loss scale: 32768.0 | grad norm: 28560.855 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.12 | iteration 15575/ 292968 | consumed samples: 31897600 | consumed tokens: 15604744192 | elapsed time per iteration (ms): 133137.7 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.725251E+00 | loss scale: 32768.0 | grad norm: 29877.934 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.43 | iteration 15576/ 292968 | consumed samples: 31899648 | consumed tokens: 15606628352 | elapsed time per iteration (ms): 132238.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.689173E+00 | loss scale: 32768.0 | grad norm: 31173.333 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.05 | iteration 15577/ 292968 | consumed samples: 31901696 | consumed tokens: 15608512512 | elapsed time per iteration (ms): 127428.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.697196E+00 | loss scale: 32768.0 | grad norm: 34119.467 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.53 | iteration 15578/ 292968 | consumed samples: 31903744 | consumed tokens: 15610396672 | elapsed time per iteration (ms): 129372.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.721995E+00 | loss scale: 32768.0 | grad norm: 35693.197 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.09 | iteration 15579/ 292968 | consumed samples: 31905792 | consumed tokens: 15612280832 | elapsed time per iteration (ms): 127529.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.719062E+00 | loss scale: 32768.0 | grad norm: 24020.041 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.45 | iteration 15580/ 292968 | consumed samples: 31907840 | consumed tokens: 15614164992 | elapsed time per iteration (ms): 127972.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.728769E+00 | loss scale: 32768.0 | grad norm: 28354.112 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.12 | iteration 15581/ 292968 | consumed samples: 31909888 | consumed tokens: 15616049152 | elapsed time per iteration (ms): 128118.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.708459E+00 | loss scale: 32768.0 | grad norm: 37158.217 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.01 | iteration 15582/ 292968 | consumed samples: 31911936 | consumed tokens: 15617933312 | elapsed time per iteration (ms): 126389.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.674977E+00 | loss scale: 32768.0 | grad norm: 26058.220 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.31 | iteration 15583/ 292968 | consumed samples: 31913984 | consumed tokens: 15619817472 | elapsed time per iteration (ms): 125796.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.703595E+00 | loss scale: 32768.0 | grad norm: 23111.092 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.77 | iteration 15584/ 292968 | consumed samples: 31916032 | consumed tokens: 15621701632 | elapsed time per iteration (ms): 126653.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.728414E+00 | loss scale: 32768.0 | grad norm: 27474.036 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.11 | iteration 15585/ 292968 | consumed samples: 31918080 | consumed tokens: 15623585792 | elapsed time per iteration (ms): 124951.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.706249E+00 | loss scale: 32768.0 | grad norm: 21961.767 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.42 | iteration 15586/ 292968 | consumed samples: 31920128 | consumed tokens: 15625469952 | elapsed time per iteration (ms): 127012.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.701878E+00 | loss scale: 32768.0 | grad norm: 18138.232 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.84 | iteration 15587/ 292968 | consumed samples: 31922176 | consumed tokens: 15627354112 | elapsed time per iteration (ms): 126044.2 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.702390E+00 | loss scale: 32768.0 | grad norm: 28088.091 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.58 | iteration 15588/ 292968 | consumed samples: 31924224 | consumed tokens: 15629238272 | elapsed time per iteration (ms): 125792.2 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.704686E+00 | loss scale: 32768.0 | grad norm: 22162.765 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.77 | iteration 15589/ 292968 | consumed samples: 31926272 | consumed tokens: 15631122432 | elapsed time per iteration (ms): 125486.3 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709743E+00 | loss scale: 32768.0 | grad norm: 19756.257 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.01 | iteration 15590/ 292968 | consumed samples: 31928320 | consumed tokens: 15633006592 | elapsed time per iteration (ms): 125874.2 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.713119E+00 | loss scale: 32768.0 | grad norm: 18220.102 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.71 | iteration 15591/ 292968 | consumed samples: 31930368 | consumed tokens: 15634890752 | elapsed time per iteration (ms): 126056.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.711478E+00 | loss scale: 32768.0 | grad norm: 21687.515 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.57 | iteration 15592/ 292968 | consumed samples: 31932416 | consumed tokens: 15636774912 | elapsed time per iteration (ms): 124724.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.704076E+00 | loss scale: 32768.0 | grad norm: 31535.993 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.60 | iteration 15593/ 292968 | consumed samples: 31934464 | consumed tokens: 15638659072 | elapsed time per iteration (ms): 124539.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709136E+00 | loss scale: 32768.0 | grad norm: 36364.024 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.74 | iteration 15594/ 292968 | consumed samples: 31936512 | consumed tokens: 15640543232 | elapsed time per iteration (ms): 125335.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.720957E+00 | loss scale: 32768.0 | grad norm: 28349.969 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.12 | iteration 15595/ 292968 | consumed samples: 31938560 | consumed tokens: 15642427392 | elapsed time per iteration (ms): 126495.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.705172E+00 | loss scale: 32768.0 | grad norm: 19034.062 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.23 | iteration 15596/ 292968 | consumed samples: 31940608 | consumed tokens: 15644311552 | elapsed time per iteration (ms): 123064.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.693616E+00 | loss scale: 32768.0 | grad norm: 20727.435 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.92 | iteration 15597/ 292968 | consumed samples: 31942656 | consumed tokens: 15646195712 | elapsed time per iteration (ms): 124327.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.713054E+00 | loss scale: 32768.0 | grad norm: 30397.490 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.91 | iteration 15598/ 292968 | consumed samples: 31944704 | consumed tokens: 15648079872 | elapsed time per iteration (ms): 127140.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.695178E+00 | loss scale: 32768.0 | grad norm: 32464.165 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.74 | iteration 15599/ 292968 | consumed samples: 31946752 | consumed tokens: 15649964032 | elapsed time per iteration (ms): 124686.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.700643E+00 | loss scale: 32768.0 | grad norm: 34456.707 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.63 | iteration 15600/ 292968 | consumed samples: 31948800 | consumed tokens: 15651848192 | elapsed time per iteration (ms): 129296.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.719775E+00 | loss scale: 32768.0 | grad norm: 24781.267 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.15 | ------------------------------------------------------------------------------------------- valid loss at iteration 15600 | lm loss value: 3.163243E+00 | lm loss PPL: 2.364716E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 15600 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 19:30:50,163] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/mp_rank_01_model_states.pt [2022-01-27 19:30:50,401] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/mp_rank_00_model_states.pt [2022-01-27 19:31:09,021] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 19:31:11,410] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 19:31:12,250] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 19:31:12,663] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 19:31:12,742] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 19:31:13,289] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 19:31:13,296] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 19:31:13,602] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 19:31:13,777] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 19:31:13,898] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 19:31:13,949] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 19:31:14,246] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 19:31:14,489] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 19:31:14,508] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 19:31:14,560] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 19:31:14,631] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 19:31:14,762] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 19:31:14,872] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 19:31:15,062] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 19:31:15,082] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 19:31:15,109] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 19:31:15,162] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 19:31:15,356] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 19:31:15,469] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 19:31:15,498] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 19:31:15,559] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 19:31:16,298] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 19:31:16,663] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 19:31:16,662] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 19:31:16,752] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 19:31:17,292] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 19:31:17,312] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 19:31:17,318] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 19:31:17,338] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 19:31:17,401] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 19:31:17,431] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 19:31:17,489] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 19:31:17,582] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 19:31:17,571] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 19:31:17,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 19:31:17,614] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 19:31:17,639] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 19:31:17,743] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 19:31:17,790] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 19:31:17,952] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 19:31:17,986] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 19:31:18,653] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 19:31:18,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 19:31:18,774] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 19:31:18,805] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 19:31:18,822] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 19:31:18,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 19:31:18,953] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 19:31:19,041] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 19:31:19,166] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 19:31:19,314] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 19:31:19,346] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 19:31:19,469] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 19:31:19,558] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 19:31:19,602] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 19:31:20,149] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 19:31:20,365] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 19:31:20,393] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 19:31:20,397] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 19:31:20,420] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 19:31:20,678] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 19:31:20,838] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 19:31:20,866] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 19:31:21,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 19:31:21,096] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 19:31:21,279] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 19:31:21,467] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 19:31:21,519] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 19:31:21,530] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 19:31:21,564] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 19:31:21,638] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 19:31:21,661] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 19:31:22,104] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 19:31:21,987] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 19:31:22,178] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 19:31:22,049] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 19:31:22,069] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 19:31:22,467] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 19:31:22,533] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 19:31:23,080] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 19:31:23,093] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 19:31:23,403] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 19:31:23,441] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 19:31:23,659] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 19:31:23,874] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 19:31:24,358] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 19:31:24,410] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 19:31:24,647] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 19:31:24,740] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 19:31:24,789] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 19:31:25,470] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 19:31:25,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 19:31:25,732] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 19:31:25,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 19:31:25,891] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 19:31:25,896] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 19:31:25,968] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 19:31:26,098] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 19:31:26,049] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 19:31:26,060] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 19:31:26,276] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 19:31:26,288] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 19:31:26,486] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 19:31:26,560] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 19:31:26,363] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 19:31:26,580] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 19:31:26,683] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 19:31:26,952] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 19:31:27,113] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 19:31:27,199] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 19:31:27,264] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 19:31:27,269] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 19:31:27,377] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 19:31:27,952] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 19:31:28,085] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 19:31:28,835] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 19:31:28,892] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 19:31:30,194] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 19:31:30,237] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 19:31:30,563] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 19:31:30,675] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 19:31:33,646] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 19:31:33,770] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15600/zero_pp_rank_0_mp_rank_124_optim_states.pt successfully saved checkpoint at iteration 15600 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 48557.37 iteration 15601/ 292968 | consumed samples: 31950848 | consumed tokens: 15653732352 | elapsed time per iteration (ms): 590585.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.703473E+00 | loss scale: 32768.0 | grad norm: 24238.680 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 20.61 | iteration 15602/ 292968 | consumed samples: 31952896 | consumed tokens: 15655616512 | elapsed time per iteration (ms): 154900.7 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709640E+00 | loss scale: 32768.0 | grad norm: 29408.795 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 78.59 | iteration 15603/ 292968 | consumed samples: 31954944 | consumed tokens: 15657500672 | elapsed time per iteration (ms): 150398.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.704059E+00 | loss scale: 32768.0 | grad norm: 26493.831 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 80.94 | iteration 15604/ 292968 | consumed samples: 31956992 | consumed tokens: 15659384832 | elapsed time per iteration (ms): 150001.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.714729E+00 | loss scale: 32768.0 | grad norm: 25192.810 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.15 | iteration 15605/ 292968 | consumed samples: 31959040 | consumed tokens: 15661268992 | elapsed time per iteration (ms): 142746.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709271E+00 | loss scale: 32768.0 | grad norm: 26688.905 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.28 | iteration 15606/ 292968 | consumed samples: 31961088 | consumed tokens: 15663153152 | elapsed time per iteration (ms): 147398.7 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.700458E+00 | loss scale: 32768.0 | grad norm: 36071.998 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.59 | iteration 15607/ 292968 | consumed samples: 31963136 | consumed tokens: 15665037312 | elapsed time per iteration (ms): 144435.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.713606E+00 | loss scale: 32768.0 | grad norm: 35204.129 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.28 | iteration 15608/ 292968 | consumed samples: 31965184 | consumed tokens: 15666921472 | elapsed time per iteration (ms): 140523.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709114E+00 | loss scale: 32768.0 | grad norm: 27329.651 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 86.63 | iteration 15609/ 292968 | consumed samples: 31967232 | consumed tokens: 15668805632 | elapsed time per iteration (ms): 132488.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.705441E+00 | loss scale: 32768.0 | grad norm: 21029.695 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.88 | iteration 15610/ 292968 | consumed samples: 31969280 | consumed tokens: 15670689792 | elapsed time per iteration (ms): 131550.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.737413E+00 | loss scale: 32768.0 | grad norm: 26747.039 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.53 | iteration 15611/ 292968 | consumed samples: 31971328 | consumed tokens: 15672573952 | elapsed time per iteration (ms): 128296.2 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.676239E+00 | loss scale: 32768.0 | grad norm: 26978.251 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.88 | iteration 15612/ 292968 | consumed samples: 31973376 | consumed tokens: 15674458112 | elapsed time per iteration (ms): 126728.3 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.728000E+00 | loss scale: 32768.0 | grad norm: 23163.752 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.06 | iteration 15613/ 292968 | consumed samples: 31975424 | consumed tokens: 15676342272 | elapsed time per iteration (ms): 127071.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.739911E+00 | loss scale: 32768.0 | grad norm: 21736.034 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.80 | iteration 15614/ 292968 | consumed samples: 31977472 | consumed tokens: 15678226432 | elapsed time per iteration (ms): 127186.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.701485E+00 | loss scale: 32768.0 | grad norm: 20331.866 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.71 | iteration 15615/ 292968 | consumed samples: 31979520 | consumed tokens: 15680110592 | elapsed time per iteration (ms): 131826.3 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.716445E+00 | loss scale: 32768.0 | grad norm: 22544.454 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.34 | iteration 15616/ 292968 | consumed samples: 31981568 | consumed tokens: 15681994752 | elapsed time per iteration (ms): 127910.4 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.743937E+00 | loss scale: 32768.0 | grad norm: 22872.948 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.17 | iteration 15617/ 292968 | consumed samples: 31983616 | consumed tokens: 15683878912 | elapsed time per iteration (ms): 126986.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709032E+00 | loss scale: 32768.0 | grad norm: 19029.247 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.86 | iteration 15618/ 292968 | consumed samples: 31985664 | consumed tokens: 15685763072 | elapsed time per iteration (ms): 127716.6 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.715839E+00 | loss scale: 32768.0 | grad norm: 20915.143 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.31 | iteration 15619/ 292968 | consumed samples: 31987712 | consumed tokens: 15687647232 | elapsed time per iteration (ms): 131322.9 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.709855E+00 | loss scale: 32768.0 | grad norm: 21717.369 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.69 | iteration 15620/ 292968 | consumed samples: 31989760 | consumed tokens: 15689531392 | elapsed time per iteration (ms): 131971.7 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.714312E+00 | loss scale: 32768.0 | grad norm: 23867.402 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.24 | iteration 15621/ 292968 | consumed samples: 31991808 | consumed tokens: 15691415552 | elapsed time per iteration (ms): 130107.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.701241E+00 | loss scale: 32768.0 | grad norm: 27554.712 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.56 | iteration 15622/ 292968 | consumed samples: 31993856 | consumed tokens: 15693299712 | elapsed time per iteration (ms): 128233.3 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.705576E+00 | loss scale: 32768.0 | grad norm: 29989.602 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.93 | iteration 15623/ 292968 | consumed samples: 31995904 | consumed tokens: 15695183872 | elapsed time per iteration (ms): 131254.5 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.692346E+00 | loss scale: 32768.0 | grad norm: 29446.983 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.74 | iteration 15624/ 292968 | consumed samples: 31997952 | consumed tokens: 15697068032 | elapsed time per iteration (ms): 149123.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.687049E+00 | loss scale: 32768.0 | grad norm: 24860.326 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 81.63 | iteration 15625/ 292968 | consumed samples: 32000000 | consumed tokens: 15698952192 | elapsed time per iteration (ms): 131746.2 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.725463E+00 | loss scale: 32768.0 | grad norm: 19227.269 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.40 | iteration 15626/ 292968 | consumed samples: 32002048 | consumed tokens: 15700836352 | elapsed time per iteration (ms): 128906.8 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.721774E+00 | loss scale: 32768.0 | grad norm: 21163.882 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.43 | iteration 15627/ 292968 | consumed samples: 32004096 | consumed tokens: 15702720512 | elapsed time per iteration (ms): 129196.6 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.715142E+00 | loss scale: 32768.0 | grad norm: 24161.169 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.22 | iteration 15628/ 292968 | consumed samples: 32006144 | consumed tokens: 15704604672 | elapsed time per iteration (ms): 132383.0 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.715104E+00 | loss scale: 32768.0 | grad norm: 24437.890 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.95 | iteration 15629/ 292968 | consumed samples: 32008192 | consumed tokens: 15706488832 | elapsed time per iteration (ms): 130450.6 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.704430E+00 | loss scale: 32768.0 | grad norm: 23375.706 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.31 | iteration 15630/ 292968 | consumed samples: 32010240 | consumed tokens: 15708372992 | elapsed time per iteration (ms): 130213.3 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.698682E+00 | loss scale: 32768.0 | grad norm: 26956.828 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.48 | iteration 15631/ 292968 | consumed samples: 32012288 | consumed tokens: 15710257152 | elapsed time per iteration (ms): 128599.1 | learning rate: 5.952E-05 | global batch size: 2048 | lm loss: 2.724629E+00 | loss scale: 32768.0 | grad norm: 42312.496 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.66 | iteration 15632/ 292968 | consumed samples: 32014336 | consumed tokens: 15712141312 | elapsed time per iteration (ms): 128636.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.720647E+00 | loss scale: 32768.0 | grad norm: 28375.086 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.63 | iteration 15633/ 292968 | consumed samples: 32016384 | consumed tokens: 15714025472 | elapsed time per iteration (ms): 128445.3 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.710890E+00 | loss scale: 32768.0 | grad norm: 33697.431 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.77 | iteration 15634/ 292968 | consumed samples: 32018432 | consumed tokens: 15715909632 | elapsed time per iteration (ms): 129314.3 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.733945E+00 | loss scale: 32768.0 | grad norm: 46698.351 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.13 | iteration 15635/ 292968 | consumed samples: 32020480 | consumed tokens: 15717793792 | elapsed time per iteration (ms): 128794.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.700438E+00 | loss scale: 32768.0 | grad norm: 17286.402 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.51 | iteration 15636/ 292968 | consumed samples: 32022528 | consumed tokens: 15719677952 | elapsed time per iteration (ms): 129662.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.686888E+00 | loss scale: 32768.0 | grad norm: 39918.484 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.88 | iteration 15637/ 292968 | consumed samples: 32024576 | consumed tokens: 15721562112 | elapsed time per iteration (ms): 126423.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.716626E+00 | loss scale: 32768.0 | grad norm: 35437.479 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.29 | iteration 15638/ 292968 | consumed samples: 32026624 | consumed tokens: 15723446272 | elapsed time per iteration (ms): 128132.3 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.720040E+00 | loss scale: 32768.0 | grad norm: 26959.703 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.00 | iteration 15639/ 292968 | consumed samples: 32028672 | consumed tokens: 15725330432 | elapsed time per iteration (ms): 129424.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.700730E+00 | loss scale: 32768.0 | grad norm: 30880.377 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.05 | iteration 15640/ 292968 | consumed samples: 32030720 | consumed tokens: 15727214592 | elapsed time per iteration (ms): 127604.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.725162E+00 | loss scale: 32768.0 | grad norm: 31238.713 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.40 | iteration 15641/ 292968 | consumed samples: 32032768 | consumed tokens: 15729098752 | elapsed time per iteration (ms): 136737.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.712554E+00 | loss scale: 32768.0 | grad norm: 27077.801 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.02 | iteration 15642/ 292968 | consumed samples: 32034816 | consumed tokens: 15730982912 | elapsed time per iteration (ms): 127410.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.700797E+00 | loss scale: 32768.0 | grad norm: 31196.798 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.54 | iteration 15643/ 292968 | consumed samples: 32036864 | consumed tokens: 15732867072 | elapsed time per iteration (ms): 129182.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.697357E+00 | loss scale: 32768.0 | grad norm: 41667.480 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.23 | iteration 15644/ 292968 | consumed samples: 32038912 | consumed tokens: 15734751232 | elapsed time per iteration (ms): 127010.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.715912E+00 | loss scale: 32768.0 | grad norm: 18612.177 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.84 | iteration 15645/ 292968 | consumed samples: 32040960 | consumed tokens: 15736635392 | elapsed time per iteration (ms): 126473.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.709867E+00 | loss scale: 32768.0 | grad norm: 36906.787 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.25 | iteration 15646/ 292968 | consumed samples: 32043008 | consumed tokens: 15738519552 | elapsed time per iteration (ms): 127948.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.704947E+00 | loss scale: 32768.0 | grad norm: 30160.045 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.14 | iteration 15647/ 292968 | consumed samples: 32045056 | consumed tokens: 15740403712 | elapsed time per iteration (ms): 125340.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.727739E+00 | loss scale: 32768.0 | grad norm: 20281.320 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.12 | iteration 15648/ 292968 | consumed samples: 32047104 | consumed tokens: 15742287872 | elapsed time per iteration (ms): 124353.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.747473E+00 | loss scale: 32768.0 | grad norm: 23212.098 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.89 | iteration 15649/ 292968 | consumed samples: 32049152 | consumed tokens: 15744172032 | elapsed time per iteration (ms): 127838.0 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.693308E+00 | loss scale: 32768.0 | grad norm: 30561.703 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.22 | iteration 15650/ 292968 | consumed samples: 32051200 | consumed tokens: 15746056192 | elapsed time per iteration (ms): 130741.0 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.672042E+00 | loss scale: 32768.0 | grad norm: 29052.628 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.11 | saving checkpoint at iteration 15650 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 21:22:12,657] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/mp_rank_01_model_states.pt [2022-01-27 21:22:12,698] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/mp_rank_00_model_states.pt [2022-01-27 21:22:34,186] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 21:22:34,255] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 21:22:34,357] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 21:22:34,368] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 21:22:34,442] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 21:22:34,557] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 21:22:34,784] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 21:22:35,245] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 21:22:35,436] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 21:22:35,765] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 21:22:36,363] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 21:22:36,403] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 21:22:36,461] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 21:22:36,513] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 21:22:36,566] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 21:22:37,275] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 21:22:37,878] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 21:22:37,883] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 21:22:38,103] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 21:22:38,316] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 21:22:38,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 21:22:38,747] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 21:22:38,908] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 21:22:38,919] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 21:22:38,938] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 21:22:38,978] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 21:22:39,030] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 21:22:39,173] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 21:22:39,472] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 21:22:39,626] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 21:22:39,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 21:22:39,959] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 21:22:40,113] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 21:22:40,397] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 21:22:40,631] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 21:22:40,656] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 21:22:40,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 21:22:40,870] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 21:22:41,111] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 21:22:41,191] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 21:22:41,314] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 21:22:41,345] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 21:22:41,466] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 21:22:41,484] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 21:22:41,539] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 21:22:41,606] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 21:22:41,615] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 21:22:41,743] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 21:22:41,865] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 21:22:41,843] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 21:22:42,017] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 21:22:42,147] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 21:22:42,156] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 21:22:42,231] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 21:22:42,484] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 21:22:42,570] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 21:22:42,589] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 21:22:42,898] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 21:22:43,119] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 21:22:43,215] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 21:22:43,339] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 21:22:43,453] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 21:22:43,654] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 21:22:43,689] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 21:22:43,711] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 21:22:43,830] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 21:22:44,937] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 21:22:45,228] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 21:22:45,235] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 21:22:45,488] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 21:22:45,674] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 21:22:46,097] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 21:22:46,147] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 21:22:46,953] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 21:22:47,131] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 21:22:47,173] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-27 21:22:47,266] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 21:22:47,345] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 21:22:47,464] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 21:22:48,047] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 21:22:48,234] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 21:22:48,282] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 21:22:48,354] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 21:22:48,900] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 21:22:49,153] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 21:22:49,199] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 21:22:49,468] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 21:22:49,482] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 21:22:49,827] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 21:22:49,857] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 21:22:49,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 21:22:49,890] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 21:22:49,939] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 21:22:49,986] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 21:22:50,018] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 21:22:50,059] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 21:22:50,123] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 21:22:50,148] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 21:22:50,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 21:22:50,197] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 21:22:50,249] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 21:22:50,199] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 21:22:50,527] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 21:22:50,608] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 21:22:50,650] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 21:22:50,560] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 21:22:50,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 21:22:50,987] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 21:22:51,032] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 21:22:51,077] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 21:22:51,193] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 21:22:51,245] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 21:22:51,294] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 21:22:52,066] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 21:22:52,076] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 21:22:52,145] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 21:22:52,236] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 21:22:52,236] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 21:22:52,352] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 21:22:53,409] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 21:22:53,566] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 21:22:53,604] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 21:22:53,761] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 21:22:53,768] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 21:22:54,220] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 21:22:54,391] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 21:22:55,349] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 21:22:55,385] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15650/zero_pp_rank_0_mp_rank_32_optim_states.pt successfully saved checkpoint at iteration 15650 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 47415.15 iteration 15651/ 292968 | consumed samples: 32053248 | consumed tokens: 15747940352 | elapsed time per iteration (ms): 179872.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.724845E+00 | loss scale: 32768.0 | grad norm: 20383.757 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.011 | TFLOPs: 67.68 | iteration 15652/ 292968 | consumed samples: 32055296 | consumed tokens: 15749824512 | elapsed time per iteration (ms): 127064.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.702465E+00 | loss scale: 32768.0 | grad norm: 30350.903 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.80 | iteration 15653/ 292968 | consumed samples: 32057344 | consumed tokens: 15751708672 | elapsed time per iteration (ms): 126906.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.702310E+00 | loss scale: 32768.0 | grad norm: 34521.745 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.92 | iteration 15654/ 292968 | consumed samples: 32059392 | consumed tokens: 15753592832 | elapsed time per iteration (ms): 132013.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.674747E+00 | loss scale: 32768.0 | grad norm: 26717.485 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.21 | iteration 15655/ 292968 | consumed samples: 32061440 | consumed tokens: 15755476992 | elapsed time per iteration (ms): 129417.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.721923E+00 | loss scale: 32768.0 | grad norm: 26975.843 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.06 | iteration 15656/ 292968 | consumed samples: 32063488 | consumed tokens: 15757361152 | elapsed time per iteration (ms): 129576.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.691756E+00 | loss scale: 32768.0 | grad norm: 25875.024 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.94 | iteration 15657/ 292968 | consumed samples: 32065536 | consumed tokens: 15759245312 | elapsed time per iteration (ms): 129799.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.676146E+00 | loss scale: 32768.0 | grad norm: 22398.464 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.78 | iteration 15658/ 292968 | consumed samples: 32067584 | consumed tokens: 15761129472 | elapsed time per iteration (ms): 129475.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.711606E+00 | loss scale: 32768.0 | grad norm: 31115.603 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.02 | iteration 15659/ 292968 | consumed samples: 32069632 | consumed tokens: 15763013632 | elapsed time per iteration (ms): 139614.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.680150E+00 | loss scale: 32768.0 | grad norm: 29254.784 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 87.19 | iteration 15660/ 292968 | consumed samples: 32071680 | consumed tokens: 15764897792 | elapsed time per iteration (ms): 128471.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.710973E+00 | loss scale: 32768.0 | grad norm: 23459.692 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.75 | iteration 15661/ 292968 | consumed samples: 32073728 | consumed tokens: 15766781952 | elapsed time per iteration (ms): 128592.9 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.697326E+00 | loss scale: 32768.0 | grad norm: 17936.275 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.66 | iteration 15662/ 292968 | consumed samples: 32075776 | consumed tokens: 15768666112 | elapsed time per iteration (ms): 127932.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.692348E+00 | loss scale: 32768.0 | grad norm: 18760.699 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.15 | iteration 15663/ 292968 | consumed samples: 32077824 | consumed tokens: 15770550272 | elapsed time per iteration (ms): 128321.3 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.728463E+00 | loss scale: 32768.0 | grad norm: 19420.668 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.86 | iteration 15664/ 292968 | consumed samples: 32079872 | consumed tokens: 15772434432 | elapsed time per iteration (ms): 128298.0 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.716460E+00 | loss scale: 32768.0 | grad norm: 27078.841 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.88 | iteration 15665/ 292968 | consumed samples: 32081920 | consumed tokens: 15774318592 | elapsed time per iteration (ms): 128251.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.735255E+00 | loss scale: 32768.0 | grad norm: 30093.649 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.91 | iteration 15666/ 292968 | consumed samples: 32083968 | consumed tokens: 15776202752 | elapsed time per iteration (ms): 130431.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.718506E+00 | loss scale: 32768.0 | grad norm: 27905.980 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.33 | iteration 15667/ 292968 | consumed samples: 32086016 | consumed tokens: 15778086912 | elapsed time per iteration (ms): 128750.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.702476E+00 | loss scale: 32768.0 | grad norm: 22649.856 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.55 | iteration 15668/ 292968 | consumed samples: 32088064 | consumed tokens: 15779971072 | elapsed time per iteration (ms): 128537.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.686366E+00 | loss scale: 32768.0 | grad norm: 26077.883 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.70 | iteration 15669/ 292968 | consumed samples: 32090112 | consumed tokens: 15781855232 | elapsed time per iteration (ms): 129875.3 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.684952E+00 | loss scale: 32768.0 | grad norm: 30543.621 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.73 | iteration 15670/ 292968 | consumed samples: 32092160 | consumed tokens: 15783739392 | elapsed time per iteration (ms): 130647.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.703972E+00 | loss scale: 32768.0 | grad norm: 31632.978 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.17 | iteration 15671/ 292968 | consumed samples: 32094208 | consumed tokens: 15785623552 | elapsed time per iteration (ms): 128935.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.700955E+00 | loss scale: 32768.0 | grad norm: 21591.123 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.41 | iteration 15672/ 292968 | consumed samples: 32096256 | consumed tokens: 15787507712 | elapsed time per iteration (ms): 126661.9 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.681099E+00 | loss scale: 32768.0 | grad norm: 24732.910 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.11 | iteration 15673/ 292968 | consumed samples: 32098304 | consumed tokens: 15789391872 | elapsed time per iteration (ms): 128786.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.697283E+00 | loss scale: 32768.0 | grad norm: 23534.227 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.52 | iteration 15674/ 292968 | consumed samples: 32100352 | consumed tokens: 15791276032 | elapsed time per iteration (ms): 127998.9 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.699588E+00 | loss scale: 32768.0 | grad norm: 20393.394 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.10 | iteration 15675/ 292968 | consumed samples: 32102400 | consumed tokens: 15793160192 | elapsed time per iteration (ms): 126952.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.717589E+00 | loss scale: 32768.0 | grad norm: 28225.477 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.89 | iteration 15676/ 292968 | consumed samples: 32104448 | consumed tokens: 15795044352 | elapsed time per iteration (ms): 131732.3 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.726859E+00 | loss scale: 32768.0 | grad norm: 23502.605 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.41 | iteration 15677/ 292968 | consumed samples: 32106496 | consumed tokens: 15796928512 | elapsed time per iteration (ms): 131266.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.706771E+00 | loss scale: 32768.0 | grad norm: 25777.049 | num zeros: 0.0 | curriculum seqlen: 920 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.73 | iteration 15678/ 292968 | consumed samples: 32108544 | consumed tokens: 15798829056 | elapsed time per iteration (ms): 125354.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.697261E+00 | loss scale: 32768.0 | grad norm: 22906.945 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.95 | iteration 15679/ 292968 | consumed samples: 32110592 | consumed tokens: 15800729600 | elapsed time per iteration (ms): 126311.0 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.695087E+00 | loss scale: 32768.0 | grad norm: 15985.920 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.21 | iteration 15680/ 292968 | consumed samples: 32112640 | consumed tokens: 15802630144 | elapsed time per iteration (ms): 125646.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.717331E+00 | loss scale: 32768.0 | grad norm: 22199.056 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.73 | iteration 15681/ 292968 | consumed samples: 32114688 | consumed tokens: 15804530688 | elapsed time per iteration (ms): 126613.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.785656E+00 | loss scale: 32768.0 | grad norm: 35576.500 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.98 | iteration 15682/ 292968 | consumed samples: 32116736 | consumed tokens: 15806431232 | elapsed time per iteration (ms): 126541.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 3.245418E+00 | loss scale: 32768.0 | grad norm: 115424.861 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.03 | iteration 15683/ 292968 | consumed samples: 32118784 | consumed tokens: 15808331776 | elapsed time per iteration (ms): 125708.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.810654E+00 | loss scale: 32768.0 | grad norm: 46079.089 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.68 | iteration 15684/ 292968 | consumed samples: 32120832 | consumed tokens: 15810232320 | elapsed time per iteration (ms): 124587.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.757988E+00 | loss scale: 32768.0 | grad norm: 31458.585 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.56 | iteration 15685/ 292968 | consumed samples: 32122880 | consumed tokens: 15812132864 | elapsed time per iteration (ms): 125291.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.740662E+00 | loss scale: 32768.0 | grad norm: 22452.762 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.00 | iteration 15686/ 292968 | consumed samples: 32124928 | consumed tokens: 15814033408 | elapsed time per iteration (ms): 123179.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.740640E+00 | loss scale: 32768.0 | grad norm: 19074.921 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.68 | iteration 15687/ 292968 | consumed samples: 32126976 | consumed tokens: 15815933952 | elapsed time per iteration (ms): 123064.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.727261E+00 | loss scale: 32768.0 | grad norm: 22504.130 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.78 | iteration 15688/ 292968 | consumed samples: 32129024 | consumed tokens: 15817834496 | elapsed time per iteration (ms): 123987.9 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.750262E+00 | loss scale: 32768.0 | grad norm: 24976.656 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.03 | iteration 15689/ 292968 | consumed samples: 32131072 | consumed tokens: 15819735040 | elapsed time per iteration (ms): 124484.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.717256E+00 | loss scale: 32768.0 | grad norm: 23181.675 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.64 | iteration 15690/ 292968 | consumed samples: 32133120 | consumed tokens: 15821635584 | elapsed time per iteration (ms): 124848.9 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.735281E+00 | loss scale: 32768.0 | grad norm: 20972.556 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.35 | iteration 15691/ 292968 | consumed samples: 32135168 | consumed tokens: 15823536128 | elapsed time per iteration (ms): 124360.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.726257E+00 | loss scale: 32768.0 | grad norm: 24868.435 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.74 | iteration 15692/ 292968 | consumed samples: 32137216 | consumed tokens: 15825436672 | elapsed time per iteration (ms): 125126.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.709453E+00 | loss scale: 32768.0 | grad norm: 29804.936 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.13 | iteration 15693/ 292968 | consumed samples: 32139264 | consumed tokens: 15827337216 | elapsed time per iteration (ms): 132860.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.710013E+00 | loss scale: 32768.0 | grad norm: 27453.035 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.42 | iteration 15694/ 292968 | consumed samples: 32141312 | consumed tokens: 15829237760 | elapsed time per iteration (ms): 127659.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.699054E+00 | loss scale: 32768.0 | grad norm: 23970.081 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.18 | iteration 15695/ 292968 | consumed samples: 32143360 | consumed tokens: 15831138304 | elapsed time per iteration (ms): 124300.0 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.704802E+00 | loss scale: 32768.0 | grad norm: 20762.583 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.78 | iteration 15696/ 292968 | consumed samples: 32145408 | consumed tokens: 15833038848 | elapsed time per iteration (ms): 123792.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.709732E+00 | loss scale: 32768.0 | grad norm: 24596.960 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.19 | iteration 15697/ 292968 | consumed samples: 32147456 | consumed tokens: 15834939392 | elapsed time per iteration (ms): 123987.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.732747E+00 | loss scale: 32768.0 | grad norm: 34124.174 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.03 | iteration 15698/ 292968 | consumed samples: 32149504 | consumed tokens: 15836839936 | elapsed time per iteration (ms): 127590.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.665708E+00 | loss scale: 32768.0 | grad norm: 32018.000 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.24 | iteration 15699/ 292968 | consumed samples: 32151552 | consumed tokens: 15838740480 | elapsed time per iteration (ms): 124981.3 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.714961E+00 | loss scale: 32768.0 | grad norm: 26263.157 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.25 | iteration 15700/ 292968 | consumed samples: 32153600 | consumed tokens: 15840641024 | elapsed time per iteration (ms): 123622.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.736382E+00 | loss scale: 32768.0 | grad norm: 21097.763 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.33 | saving checkpoint at iteration 15700 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-27 23:09:21,331] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/mp_rank_00_model_states.pt [2022-01-27 23:09:21,687] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/mp_rank_01_model_states.pt [2022-01-27 23:09:42,677] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-27 23:09:42,982] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-27 23:09:43,785] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-27 23:09:44,232] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-27 23:09:44,384] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-27 23:09:44,580] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-27 23:09:44,599] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-27 23:09:44,600] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-27 23:09:44,690] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-27 23:09:44,993] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-27 23:09:45,092] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-27 23:09:45,292] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-27 23:09:45,474] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-27 23:09:45,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-27 23:09:46,319] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-27 23:09:46,404] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-27 23:09:46,652] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-27 23:09:46,942] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-27 23:09:47,295] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-27 23:09:47,332] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-27 23:09:47,481] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-27 23:09:47,520] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-27 23:09:47,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-27 23:09:48,020] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-27 23:09:48,134] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-27 23:09:48,173] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-27 23:09:48,283] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-27 23:09:48,294] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-27 23:09:48,406] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-27 23:09:48,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-27 23:09:48,733] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-27 23:09:49,100] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-27 23:09:49,134] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-27 23:09:49,179] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-27 23:09:49,444] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-27 23:09:49,632] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-27 23:09:49,600] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-27 23:09:49,787] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-27 23:09:49,828] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-27 23:09:50,029] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-27 23:09:49,929] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-27 23:09:50,146] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-27 23:09:50,242] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-27 23:09:50,330] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-27 23:09:50,510] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-27 23:09:50,519] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-27 23:09:50,511] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-27 23:09:50,611] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-27 23:09:51,091] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-27 23:09:51,135] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-27 23:09:51,158] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-27 23:09:51,227] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-27 23:09:51,295] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-27 23:09:51,346] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-27 23:09:51,429] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-27 23:09:51,491] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-27 23:09:52,186] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-27 23:09:52,341] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-27 23:09:52,617] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-27 23:09:52,739] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-27 23:09:52,759] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-27 23:09:52,902] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-27 23:09:52,985] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-27 23:09:53,010] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-27 23:09:53,131] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-27 23:09:53,283] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-27 23:09:53,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-27 23:09:53,695] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-27 23:09:53,740] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-27 23:09:53,773] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-27 23:09:53,845] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-27 23:09:53,895] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-27 23:09:54,000] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-27 23:09:54,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-27 23:09:54,069] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-27 23:09:54,079] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-27 23:09:54,133] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-27 23:09:54,154] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-27 23:09:54,232] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-27 23:09:54,244] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-27 23:09:54,337] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-27 23:09:54,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-27 23:09:54,469] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-27 23:09:54,648] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-27 23:09:54,811] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-27 23:09:55,033] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-27 23:09:55,095] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-27 23:09:55,256] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-27 23:09:55,290] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-27 23:09:55,331] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-27 23:09:55,487] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-27 23:09:55,499] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-27 23:09:55,521] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-27 23:09:55,525] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-27 23:09:55,591] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-27 23:09:55,617] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-27 23:09:55,633] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-27 23:09:55,661] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-27 23:09:55,768] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-27 23:09:55,840] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-27 23:09:55,901] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-27 23:09:55,971] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-27 23:09:56,002] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-27 23:09:56,019] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-27 23:09:56,076] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-27 23:09:56,106] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-27 23:09:56,182] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-27 23:09:56,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-27 23:09:57,168] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-27 23:09:57,250] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-27 23:09:57,290] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-27 23:09:57,307] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-27 23:09:57,822] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-27 23:09:57,827] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-27 23:09:57,927] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-27 23:09:57,976] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-27 23:09:59,670] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-27 23:09:59,751] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-27 23:10:01,056] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-27 23:10:01,198] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-27 23:10:03,052] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-27 23:10:04,192] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-27 23:10:05,615] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-27 23:10:05,617] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-27 23:10:05,976] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-27 23:10:06,230] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-27 23:10:07,342] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-27 23:10:07,614] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15700/zero_pp_rank_0_mp_rank_124_optim_states.pt successfully saved checkpoint at iteration 15700 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 51557.29 iteration 15701/ 292968 | consumed samples: 32155648 | consumed tokens: 15842541568 | elapsed time per iteration (ms): 176625.0 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.703106E+00 | loss scale: 32768.0 | grad norm: 20765.102 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 69.52 | iteration 15702/ 292968 | consumed samples: 32157696 | consumed tokens: 15844442112 | elapsed time per iteration (ms): 123861.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.700566E+00 | loss scale: 32768.0 | grad norm: 23413.638 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.13 | iteration 15703/ 292968 | consumed samples: 32159744 | consumed tokens: 15846342656 | elapsed time per iteration (ms): 123609.7 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.729424E+00 | loss scale: 32768.0 | grad norm: 23014.426 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.34 | iteration 15704/ 292968 | consumed samples: 32161792 | consumed tokens: 15848243200 | elapsed time per iteration (ms): 123662.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.722042E+00 | loss scale: 32768.0 | grad norm: 25226.542 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.29 | iteration 15705/ 292968 | consumed samples: 32163840 | consumed tokens: 15850143744 | elapsed time per iteration (ms): 125245.0 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.698535E+00 | loss scale: 32768.0 | grad norm: 20606.103 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.04 | iteration 15706/ 292968 | consumed samples: 32165888 | consumed tokens: 15852044288 | elapsed time per iteration (ms): 125047.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.700947E+00 | loss scale: 32768.0 | grad norm: 20167.997 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.19 | iteration 15707/ 292968 | consumed samples: 32167936 | consumed tokens: 15853944832 | elapsed time per iteration (ms): 124235.1 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.751415E+00 | loss scale: 32768.0 | grad norm: 27244.270 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.84 | iteration 15708/ 292968 | consumed samples: 32169984 | consumed tokens: 15855845376 | elapsed time per iteration (ms): 124821.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.686051E+00 | loss scale: 32768.0 | grad norm: 35152.132 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.37 | iteration 15709/ 292968 | consumed samples: 32172032 | consumed tokens: 15857745920 | elapsed time per iteration (ms): 124353.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.699234E+00 | loss scale: 32768.0 | grad norm: 29944.892 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.74 | iteration 15710/ 292968 | consumed samples: 32174080 | consumed tokens: 15859646464 | elapsed time per iteration (ms): 129201.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.718598E+00 | loss scale: 65536.0 | grad norm: 30434.456 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.04 | iteration 15711/ 292968 | consumed samples: 32176128 | consumed tokens: 15861547008 | elapsed time per iteration (ms): 124455.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.712129E+00 | loss scale: 65536.0 | grad norm: 63237.469 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.66 | iteration 15712/ 292968 | consumed samples: 32178176 | consumed tokens: 15863447552 | elapsed time per iteration (ms): 123269.5 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.740295E+00 | loss scale: 65536.0 | grad norm: 128657.608 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.61 | iteration 15713/ 292968 | consumed samples: 32180224 | consumed tokens: 15865348096 | elapsed time per iteration (ms): 122746.2 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.698574E+00 | loss scale: 65536.0 | grad norm: 67421.546 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.03 | iteration 15714/ 292968 | consumed samples: 32182272 | consumed tokens: 15867248640 | elapsed time per iteration (ms): 123957.4 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.727134E+00 | loss scale: 65536.0 | grad norm: 132990.973 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.06 | iteration 15715/ 292968 | consumed samples: 32184320 | consumed tokens: 15869149184 | elapsed time per iteration (ms): 124360.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.734294E+00 | loss scale: 65536.0 | grad norm: 124053.481 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.74 | iteration 15716/ 292968 | consumed samples: 32186368 | consumed tokens: 15871049728 | elapsed time per iteration (ms): 127000.8 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.709952E+00 | loss scale: 65536.0 | grad norm: 82772.106 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.68 | iteration 15717/ 292968 | consumed samples: 32188416 | consumed tokens: 15872950272 | elapsed time per iteration (ms): 124810.6 | learning rate: 5.951E-05 | global batch size: 2048 | lm loss: 2.681563E+00 | loss scale: 65536.0 | grad norm: 93530.570 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.38 | iteration 15718/ 292968 | consumed samples: 32190464 | consumed tokens: 15874850816 | elapsed time per iteration (ms): 126181.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.680599E+00 | loss scale: 65536.0 | grad norm: 72068.633 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.31 | iteration 15719/ 292968 | consumed samples: 32192512 | consumed tokens: 15876751360 | elapsed time per iteration (ms): 125579.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.703617E+00 | loss scale: 65536.0 | grad norm: 56853.398 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.78 | iteration 15720/ 292968 | consumed samples: 32194560 | consumed tokens: 15878651904 | elapsed time per iteration (ms): 127103.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.696939E+00 | loss scale: 65536.0 | grad norm: 73490.016 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.61 | iteration 15721/ 292968 | consumed samples: 32196608 | consumed tokens: 15880552448 | elapsed time per iteration (ms): 124523.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.696381E+00 | loss scale: 65536.0 | grad norm: 44322.409 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.61 | iteration 15722/ 292968 | consumed samples: 32198656 | consumed tokens: 15882452992 | elapsed time per iteration (ms): 124169.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.702187E+00 | loss scale: 65536.0 | grad norm: 50720.893 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.89 | iteration 15723/ 292968 | consumed samples: 32200704 | consumed tokens: 15884353536 | elapsed time per iteration (ms): 126667.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.689649E+00 | loss scale: 65536.0 | grad norm: 62123.250 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.94 | iteration 15724/ 292968 | consumed samples: 32202752 | consumed tokens: 15886254080 | elapsed time per iteration (ms): 123144.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.697295E+00 | loss scale: 65536.0 | grad norm: 41528.434 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.71 | iteration 15725/ 292968 | consumed samples: 32204800 | consumed tokens: 15888154624 | elapsed time per iteration (ms): 123334.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.721175E+00 | loss scale: 65536.0 | grad norm: 32357.570 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.56 | iteration 15726/ 292968 | consumed samples: 32206848 | consumed tokens: 15890055168 | elapsed time per iteration (ms): 122332.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.708838E+00 | loss scale: 65536.0 | grad norm: 55370.484 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.37 | iteration 15727/ 292968 | consumed samples: 32208896 | consumed tokens: 15891955712 | elapsed time per iteration (ms): 129800.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.708807E+00 | loss scale: 65536.0 | grad norm: 56676.454 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.60 | iteration 15728/ 292968 | consumed samples: 32210944 | consumed tokens: 15893856256 | elapsed time per iteration (ms): 125180.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.666071E+00 | loss scale: 65536.0 | grad norm: 38480.608 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.09 | iteration 15729/ 292968 | consumed samples: 32212992 | consumed tokens: 15895756800 | elapsed time per iteration (ms): 122728.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.682330E+00 | loss scale: 65536.0 | grad norm: 40641.123 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.05 | iteration 15730/ 292968 | consumed samples: 32215040 | consumed tokens: 15897657344 | elapsed time per iteration (ms): 123794.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.703400E+00 | loss scale: 65536.0 | grad norm: 44916.999 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.19 | iteration 15731/ 292968 | consumed samples: 32217088 | consumed tokens: 15899557888 | elapsed time per iteration (ms): 125770.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.663400E+00 | loss scale: 65536.0 | grad norm: 58139.969 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.63 | iteration 15732/ 292968 | consumed samples: 32219136 | consumed tokens: 15901458432 | elapsed time per iteration (ms): 124296.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.683798E+00 | loss scale: 65536.0 | grad norm: 90096.769 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.79 | iteration 15733/ 292968 | consumed samples: 32221184 | consumed tokens: 15903358976 | elapsed time per iteration (ms): 124336.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.676914E+00 | loss scale: 65536.0 | grad norm: 30206.372 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.75 | iteration 15734/ 292968 | consumed samples: 32223232 | consumed tokens: 15905259520 | elapsed time per iteration (ms): 123614.7 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.701538E+00 | loss scale: 65536.0 | grad norm: 47764.786 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.33 | iteration 15735/ 292968 | consumed samples: 32225280 | consumed tokens: 15907160064 | elapsed time per iteration (ms): 122867.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.707022E+00 | loss scale: 65536.0 | grad norm: 81069.106 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.94 | iteration 15736/ 292968 | consumed samples: 32227328 | consumed tokens: 15909060608 | elapsed time per iteration (ms): 122007.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.659968E+00 | loss scale: 65536.0 | grad norm: 44160.119 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.64 | iteration 15737/ 292968 | consumed samples: 32229376 | consumed tokens: 15910961152 | elapsed time per iteration (ms): 124102.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.682047E+00 | loss scale: 65536.0 | grad norm: 30515.883 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 98.94 | iteration 15738/ 292968 | consumed samples: 32231424 | consumed tokens: 15912861696 | elapsed time per iteration (ms): 124654.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.692760E+00 | loss scale: 65536.0 | grad norm: 40182.491 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.50 | iteration 15739/ 292968 | consumed samples: 32233472 | consumed tokens: 15914762240 | elapsed time per iteration (ms): 126341.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.716207E+00 | loss scale: 65536.0 | grad norm: 51745.871 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.19 | iteration 15740/ 292968 | consumed samples: 32235520 | consumed tokens: 15916662784 | elapsed time per iteration (ms): 123369.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.680081E+00 | loss scale: 65536.0 | grad norm: 53786.824 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.53 | iteration 15741/ 292968 | consumed samples: 32237568 | consumed tokens: 15918563328 | elapsed time per iteration (ms): 123439.5 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.666825E+00 | loss scale: 65536.0 | grad norm: 47513.683 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.47 | iteration 15742/ 292968 | consumed samples: 32239616 | consumed tokens: 15920463872 | elapsed time per iteration (ms): 127085.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.693784E+00 | loss scale: 65536.0 | grad norm: 54083.488 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.62 | iteration 15743/ 292968 | consumed samples: 32241664 | consumed tokens: 15922364416 | elapsed time per iteration (ms): 125312.7 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.718916E+00 | loss scale: 65536.0 | grad norm: 61492.766 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.99 | iteration 15744/ 292968 | consumed samples: 32243712 | consumed tokens: 15924264960 | elapsed time per iteration (ms): 125780.0 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.668805E+00 | loss scale: 65536.0 | grad norm: 62265.358 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.62 | iteration 15745/ 292968 | consumed samples: 32245760 | consumed tokens: 15926165504 | elapsed time per iteration (ms): 131501.7 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.702822E+00 | loss scale: 65536.0 | grad norm: 60058.033 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.37 | iteration 15746/ 292968 | consumed samples: 32247808 | consumed tokens: 15928066048 | elapsed time per iteration (ms): 125987.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.682689E+00 | loss scale: 65536.0 | grad norm: 76149.049 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.46 | iteration 15747/ 292968 | consumed samples: 32249856 | consumed tokens: 15929966592 | elapsed time per iteration (ms): 125333.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.717255E+00 | loss scale: 65536.0 | grad norm: 54298.570 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.97 | iteration 15748/ 292968 | consumed samples: 32251904 | consumed tokens: 15931867136 | elapsed time per iteration (ms): 128787.0 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.686649E+00 | loss scale: 65536.0 | grad norm: 56283.620 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.34 | iteration 15749/ 292968 | consumed samples: 32253952 | consumed tokens: 15933767680 | elapsed time per iteration (ms): 125850.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.707014E+00 | loss scale: 65536.0 | grad norm: 64316.883 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.57 | iteration 15750/ 292968 | consumed samples: 32256000 | consumed tokens: 15935668224 | elapsed time per iteration (ms): 126056.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.664401E+00 | loss scale: 65536.0 | grad norm: 61082.981 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.41 | ------------------------------------------------------------------------------------------- valid loss at iteration 15750 | lm loss value: 3.111634E+00 | lm loss PPL: 2.245771E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 15750 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-28 01:00:40,072] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/mp_rank_01_model_states.pt [2022-01-28 01:00:40,135] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/mp_rank_00_model_states.pt [2022-01-28 01:01:00,453] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-28 01:01:01,617] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-28 01:01:02,186] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-28 01:01:04,170] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-28 01:01:04,213] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-28 01:01:04,240] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-28 01:01:04,215] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-28 01:01:04,288] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-28 01:01:04,305] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-28 01:01:04,307] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-28 01:01:04,579] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-28 01:01:04,646] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-28 01:01:04,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-28 01:01:04,992] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-28 01:01:04,998] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-28 01:01:05,100] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-28 01:01:05,134] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-28 01:01:05,460] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-28 01:01:05,512] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-28 01:01:06,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-28 01:01:06,536] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-28 01:01:06,698] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-28 01:01:06,954] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-28 01:01:07,040] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-28 01:01:07,059] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-28 01:01:07,060] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-28 01:01:07,298] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-28 01:01:07,341] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-28 01:01:07,555] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-28 01:01:07,628] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-28 01:01:07,681] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-28 01:01:07,734] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-28 01:01:07,750] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-28 01:01:07,764] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-28 01:01:07,736] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-28 01:01:08,219] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-28 01:01:08,303] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-28 01:01:08,607] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-28 01:01:08,643] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-28 01:01:08,741] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-28 01:01:08,822] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-28 01:01:09,452] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-28 01:01:09,421] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-28 01:01:09,790] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-28 01:01:09,848] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-28 01:01:09,870] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-28 01:01:09,974] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-28 01:01:10,014] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-28 01:01:10,149] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-28 01:01:10,212] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-28 01:01:10,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-28 01:01:10,276] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-28 01:01:10,357] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-28 01:01:10,369] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-28 01:01:10,477] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-28 01:01:10,519] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-28 01:01:10,549] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-28 01:01:10,587] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-28 01:01:10,664] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-28 01:01:10,656] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-28 01:01:10,790] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-28 01:01:10,859] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-28 01:01:10,993] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-28 01:01:11,030] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-28 01:01:11,088] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-28 01:01:11,142] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-28 01:01:12,038] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-28 01:01:12,251] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-28 01:01:12,293] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-28 01:01:12,320] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-28 01:01:12,329] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-28 01:01:12,652] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-28 01:01:13,301] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-28 01:01:13,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-28 01:01:14,474] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-28 01:01:14,782] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-28 01:01:14,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-28 01:01:14,823] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-28 01:01:14,778] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-28 01:01:14,850] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-28 01:01:15,002] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-28 01:01:15,139] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-28 01:01:15,203] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-28 01:01:15,413] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-28 01:01:15,481] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-28 01:01:15,562] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-28 01:01:15,647] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-28 01:01:15,840] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-28 01:01:16,008] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-28 01:01:16,012] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-28 01:01:16,198] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-28 01:01:16,642] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-28 01:01:16,924] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-28 01:01:16,929] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-28 01:01:16,975] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-28 01:01:17,037] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-28 01:01:17,247] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-28 01:01:17,421] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-28 01:01:17,637] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-28 01:01:17,711] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-28 01:01:17,806] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-28 01:01:17,684] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-28 01:01:17,888] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-28 01:01:17,991] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-28 01:01:18,006] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-28 01:01:18,033] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-28 01:01:18,063] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-28 01:01:18,298] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-28 01:01:18,341] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-28 01:01:18,729] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-28 01:01:18,906] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-28 01:01:19,512] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-28 01:01:19,541] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-28 01:01:19,568] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-28 01:01:19,587] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-28 01:01:19,779] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-28 01:01:19,864] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-28 01:01:19,919] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-28 01:01:21,137] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-28 01:01:21,358] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-28 01:01:21,387] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-28 01:01:21,391] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-28 01:01:21,402] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-28 01:01:21,505] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-28 01:01:22,512] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-28 01:01:22,553] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-28 01:01:26,982] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-28 01:01:27,143] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15750/zero_pp_rank_0_mp_rank_124_optim_states.pt successfully saved checkpoint at iteration 15750 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 51950.18 iteration 15751/ 292968 | consumed samples: 32258048 | consumed tokens: 15937568768 | elapsed time per iteration (ms): 570254.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.683408E+00 | loss scale: 65536.0 | grad norm: 72167.985 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.004 | TFLOPs: 21.53 | iteration 15752/ 292968 | consumed samples: 32260096 | consumed tokens: 15939469312 | elapsed time per iteration (ms): 133522.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.701755E+00 | loss scale: 65536.0 | grad norm: 52125.598 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.96 | iteration 15753/ 292968 | consumed samples: 32262144 | consumed tokens: 15941369856 | elapsed time per iteration (ms): 133310.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.694185E+00 | loss scale: 65536.0 | grad norm: 50202.377 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.11 | iteration 15754/ 292968 | consumed samples: 32264192 | consumed tokens: 15943270400 | elapsed time per iteration (ms): 132077.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.665569E+00 | loss scale: 65536.0 | grad norm: 65117.663 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 92.97 | iteration 15755/ 292968 | consumed samples: 32266240 | consumed tokens: 15945170944 | elapsed time per iteration (ms): 134503.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.655343E+00 | loss scale: 65536.0 | grad norm: 57995.029 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.29 | iteration 15756/ 292968 | consumed samples: 32268288 | consumed tokens: 15947071488 | elapsed time per iteration (ms): 133516.5 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.713712E+00 | loss scale: 65536.0 | grad norm: 43738.935 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.96 | iteration 15757/ 292968 | consumed samples: 32270336 | consumed tokens: 15948972032 | elapsed time per iteration (ms): 135241.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.684027E+00 | loss scale: 65536.0 | grad norm: 54850.819 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.79 | iteration 15758/ 292968 | consumed samples: 32272384 | consumed tokens: 15950872576 | elapsed time per iteration (ms): 140530.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.697895E+00 | loss scale: 65536.0 | grad norm: 54647.692 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 87.38 | iteration 15759/ 292968 | consumed samples: 32274432 | consumed tokens: 15952773120 | elapsed time per iteration (ms): 134708.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.687193E+00 | loss scale: 65536.0 | grad norm: 45299.277 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.15 | iteration 15760/ 292968 | consumed samples: 32276480 | consumed tokens: 15954673664 | elapsed time per iteration (ms): 132292.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.649974E+00 | loss scale: 65536.0 | grad norm: 35428.100 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.82 | iteration 15761/ 292968 | consumed samples: 32278528 | consumed tokens: 15956574208 | elapsed time per iteration (ms): 130764.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.666897E+00 | loss scale: 65536.0 | grad norm: 36003.170 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.90 | iteration 15762/ 292968 | consumed samples: 32280576 | consumed tokens: 15958474752 | elapsed time per iteration (ms): 135145.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.719684E+00 | loss scale: 65536.0 | grad norm: 50408.528 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.86 | iteration 15763/ 292968 | consumed samples: 32282624 | consumed tokens: 15960375296 | elapsed time per iteration (ms): 131331.7 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.666784E+00 | loss scale: 65536.0 | grad norm: 54192.832 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.49 | iteration 15764/ 292968 | consumed samples: 32284672 | consumed tokens: 15962275840 | elapsed time per iteration (ms): 131864.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.681812E+00 | loss scale: 65536.0 | grad norm: 55961.015 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.12 | iteration 15765/ 292968 | consumed samples: 32286720 | consumed tokens: 15964176384 | elapsed time per iteration (ms): 129617.5 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.673393E+00 | loss scale: 65536.0 | grad norm: 58691.089 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.73 | iteration 15766/ 292968 | consumed samples: 32288768 | consumed tokens: 15966076928 | elapsed time per iteration (ms): 129000.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.657890E+00 | loss scale: 65536.0 | grad norm: 66452.768 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.18 | iteration 15767/ 292968 | consumed samples: 32290816 | consumed tokens: 15967977472 | elapsed time per iteration (ms): 129214.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.674819E+00 | loss scale: 65536.0 | grad norm: 58543.442 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.03 | iteration 15768/ 292968 | consumed samples: 32292864 | consumed tokens: 15969878016 | elapsed time per iteration (ms): 128041.5 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.645418E+00 | loss scale: 65536.0 | grad norm: 55650.726 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.90 | iteration 15769/ 292968 | consumed samples: 32294912 | consumed tokens: 15971778560 | elapsed time per iteration (ms): 128525.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.667620E+00 | loss scale: 65536.0 | grad norm: 61963.905 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.54 | iteration 15770/ 292968 | consumed samples: 32296960 | consumed tokens: 15973679104 | elapsed time per iteration (ms): 130168.7 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.691546E+00 | loss scale: 65536.0 | grad norm: 61737.470 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.33 | iteration 15771/ 292968 | consumed samples: 32299008 | consumed tokens: 15975579648 | elapsed time per iteration (ms): 127637.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.653766E+00 | loss scale: 65536.0 | grad norm: 45198.210 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.20 | iteration 15772/ 292968 | consumed samples: 32301056 | consumed tokens: 15977480192 | elapsed time per iteration (ms): 127856.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.695296E+00 | loss scale: 65536.0 | grad norm: 41476.592 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.04 | iteration 15773/ 292968 | consumed samples: 32303104 | consumed tokens: 15979380736 | elapsed time per iteration (ms): 126477.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.687865E+00 | loss scale: 65536.0 | grad norm: 60790.726 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.08 | iteration 15774/ 292968 | consumed samples: 32305152 | consumed tokens: 15981281280 | elapsed time per iteration (ms): 129227.5 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.704042E+00 | loss scale: 65536.0 | grad norm: 76786.266 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.02 | iteration 15775/ 292968 | consumed samples: 32307200 | consumed tokens: 15983181824 | elapsed time per iteration (ms): 126480.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.675276E+00 | loss scale: 65536.0 | grad norm: 48765.285 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.08 | iteration 15776/ 292968 | consumed samples: 32309248 | consumed tokens: 15985082368 | elapsed time per iteration (ms): 136347.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.684493E+00 | loss scale: 65536.0 | grad norm: 51678.145 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 90.06 | iteration 15777/ 292968 | consumed samples: 32311296 | consumed tokens: 15986982912 | elapsed time per iteration (ms): 126503.4 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.711652E+00 | loss scale: 65536.0 | grad norm: 58313.123 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.06 | iteration 15778/ 292968 | consumed samples: 32313344 | consumed tokens: 15988883456 | elapsed time per iteration (ms): 128435.7 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.678327E+00 | loss scale: 65536.0 | grad norm: 50878.724 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.60 | iteration 15779/ 292968 | consumed samples: 32315392 | consumed tokens: 15990784000 | elapsed time per iteration (ms): 128803.7 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.697378E+00 | loss scale: 65536.0 | grad norm: 46414.612 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.33 | iteration 15780/ 292968 | consumed samples: 32317440 | consumed tokens: 15992684544 | elapsed time per iteration (ms): 126679.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.690092E+00 | loss scale: 65536.0 | grad norm: 66527.890 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.93 | iteration 15781/ 292968 | consumed samples: 32319488 | consumed tokens: 15994585088 | elapsed time per iteration (ms): 125520.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.693377E+00 | loss scale: 65536.0 | grad norm: 83551.330 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.82 | iteration 15782/ 292968 | consumed samples: 32321536 | consumed tokens: 15996485632 | elapsed time per iteration (ms): 129692.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.695050E+00 | loss scale: 65536.0 | grad norm: 39650.368 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.68 | iteration 15783/ 292968 | consumed samples: 32323584 | consumed tokens: 15998386176 | elapsed time per iteration (ms): 130065.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.670688E+00 | loss scale: 65536.0 | grad norm: 75841.182 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.40 | iteration 15784/ 292968 | consumed samples: 32325632 | consumed tokens: 16000286720 | elapsed time per iteration (ms): 126994.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.684283E+00 | loss scale: 65536.0 | grad norm: 69036.665 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.69 | iteration 15785/ 292968 | consumed samples: 32327680 | consumed tokens: 16002187264 | elapsed time per iteration (ms): 127890.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.686394E+00 | loss scale: 65536.0 | grad norm: 48386.468 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.01 | iteration 15786/ 292968 | consumed samples: 32329728 | consumed tokens: 16004087808 | elapsed time per iteration (ms): 126543.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.688574E+00 | loss scale: 65536.0 | grad norm: 64385.663 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.03 | iteration 15787/ 292968 | consumed samples: 32331776 | consumed tokens: 16005988352 | elapsed time per iteration (ms): 127793.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.670002E+00 | loss scale: 65536.0 | grad norm: 70767.204 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.08 | iteration 15788/ 292968 | consumed samples: 32333824 | consumed tokens: 16007888896 | elapsed time per iteration (ms): 128943.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.674184E+00 | loss scale: 65536.0 | grad norm: 30967.772 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.23 | iteration 15789/ 292968 | consumed samples: 32335872 | consumed tokens: 16009789440 | elapsed time per iteration (ms): 130795.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.670779E+00 | loss scale: 65536.0 | grad norm: 43047.102 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.88 | iteration 15790/ 292968 | consumed samples: 32337920 | consumed tokens: 16011689984 | elapsed time per iteration (ms): 126807.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.715770E+00 | loss scale: 65536.0 | grad norm: 51799.432 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.83 | iteration 15791/ 292968 | consumed samples: 32339968 | consumed tokens: 16013590528 | elapsed time per iteration (ms): 125717.2 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.691998E+00 | loss scale: 65536.0 | grad norm: 51998.065 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.67 | iteration 15792/ 292968 | consumed samples: 32342016 | consumed tokens: 16015491072 | elapsed time per iteration (ms): 126357.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.672033E+00 | loss scale: 65536.0 | grad norm: 69116.889 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.18 | iteration 15793/ 292968 | consumed samples: 32344064 | consumed tokens: 16017391616 | elapsed time per iteration (ms): 127313.3 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.722970E+00 | loss scale: 65536.0 | grad norm: 70223.782 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.45 | iteration 15794/ 292968 | consumed samples: 32346112 | consumed tokens: 16019292160 | elapsed time per iteration (ms): 133302.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.678644E+00 | loss scale: 65536.0 | grad norm: 58238.463 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.11 | iteration 15795/ 292968 | consumed samples: 32348160 | consumed tokens: 16021192704 | elapsed time per iteration (ms): 131493.1 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.685365E+00 | loss scale: 65536.0 | grad norm: 34868.790 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.38 | iteration 15796/ 292968 | consumed samples: 32350208 | consumed tokens: 16023093248 | elapsed time per iteration (ms): 127559.9 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.646840E+00 | loss scale: 65536.0 | grad norm: 38906.837 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.26 | iteration 15797/ 292968 | consumed samples: 32352256 | consumed tokens: 16024993792 | elapsed time per iteration (ms): 125850.5 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.677294E+00 | loss scale: 65536.0 | grad norm: 54343.063 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.57 | iteration 15798/ 292968 | consumed samples: 32354304 | consumed tokens: 16026894336 | elapsed time per iteration (ms): 125861.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.723079E+00 | loss scale: 65536.0 | grad norm: 54590.371 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.56 | iteration 15799/ 292968 | consumed samples: 32356352 | consumed tokens: 16028794880 | elapsed time per iteration (ms): 128306.0 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.662372E+00 | loss scale: 65536.0 | grad norm: 67676.931 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.70 | saving checkpoint at iteration 15800 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 iteration 15800/ 292968 | consumed samples: 32358400 | consumed tokens: 16030695424 | elapsed time per iteration (ms): 130074.6 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.674270E+00 | loss scale: 65536.0 | grad norm: 73678.889 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.40 | [2022-01-28 02:49:52,790] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/mp_rank_00_model_states.pt [2022-01-28 02:49:52,854] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/mp_rank_01_model_states.pt [2022-01-28 02:51:59,869] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-28 02:52:01,524] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-28 02:52:01,941] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-28 02:52:02,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-28 02:52:03,624] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-28 02:52:03,669] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-28 02:52:03,702] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-28 02:52:04,116] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-28 02:52:04,389] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-28 02:52:04,446] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-28 02:52:04,610] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-28 02:52:04,667] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-28 02:52:04,863] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-28 02:52:04,864] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-28 02:52:04,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-28 02:52:05,093] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-28 02:52:05,149] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-28 02:52:05,175] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-28 02:52:05,177] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-28 02:52:05,276] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-28 02:52:05,328] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-28 02:52:06,217] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-28 02:52:06,217] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-28 02:52:06,283] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-28 02:52:06,319] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-28 02:52:06,296] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-28 02:52:06,369] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-28 02:52:06,630] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-28 02:52:06,682] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-28 02:52:06,829] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-28 02:52:06,841] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-28 02:52:07,071] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-28 02:52:07,107] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-28 02:52:07,072] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-28 02:52:07,147] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-28 02:52:07,221] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-28 02:52:07,309] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-28 02:52:07,344] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-28 02:52:07,461] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-28 02:52:07,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-28 02:52:08,071] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-28 02:52:08,355] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-28 02:52:08,465] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-28 02:52:08,484] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-28 02:52:08,629] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-28 02:52:08,714] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-28 02:52:08,738] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-28 02:52:08,817] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-28 02:52:08,823] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-28 02:52:08,839] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-28 02:52:08,968] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-28 02:52:08,996] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-28 02:52:09,091] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-28 02:52:09,112] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-28 02:52:09,187] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-28 02:52:09,255] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-28 02:52:09,273] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-28 02:52:09,303] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-28 02:52:09,891] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-28 02:52:10,163] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-28 02:52:10,188] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-28 02:52:10,205] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-28 02:52:10,513] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-28 02:52:10,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-28 02:52:11,706] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-28 02:52:12,627] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-28 02:52:12,636] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-28 02:52:13,012] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-28 02:52:13,145] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-28 02:52:13,157] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-28 02:52:13,222] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-28 02:52:13,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-28 02:52:13,454] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-28 02:52:13,459] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-28 02:52:13,476] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-28 02:52:13,766] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-28 02:52:13,816] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-28 02:52:13,998] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-28 02:52:14,012] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-28 02:52:14,026] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-28 02:52:14,478] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-28 02:52:14,509] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-28 02:52:14,526] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-28 02:52:14,698] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-28 02:52:14,701] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-28 02:52:14,753] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-28 02:52:14,761] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-28 02:52:14,824] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-28 02:52:14,922] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-28 02:52:15,048] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-28 02:52:15,260] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-28 02:52:15,519] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-28 02:52:15,565] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-28 02:52:15,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-28 02:52:15,987] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-28 02:52:15,991] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-28 02:52:16,106] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-28 02:52:16,115] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-28 02:52:16,213] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-28 02:52:16,410] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-28 02:52:16,480] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-28 02:52:16,761] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-28 02:52:17,020] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-28 02:52:17,355] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-28 02:52:17,377] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-28 02:52:17,416] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-28 02:52:17,424] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-28 02:52:17,509] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-28 02:52:17,550] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-28 02:52:17,555] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-28 02:52:17,622] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-28 02:52:17,647] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-28 02:52:17,781] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-28 02:52:17,627] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-28 02:52:17,809] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-28 02:52:17,927] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-28 02:52:18,085] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-28 02:52:18,196] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-28 02:52:18,853] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-28 02:52:18,961] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-28 02:52:19,476] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-28 02:52:19,613] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-28 02:52:19,807] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-28 02:52:19,809] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-28 02:52:21,754] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-28 02:52:21,856] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-28 02:52:23,289] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-28 02:52:23,292] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15800/zero_pp_rank_0_mp_rank_117_optim_states.pt successfully saved checkpoint at iteration 15800 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 153971.26 iteration 15801/ 292968 | consumed samples: 32360448 | consumed tokens: 16032595968 | elapsed time per iteration (ms): 285624.8 | learning rate: 5.950E-05 | global batch size: 2048 | lm loss: 2.692744E+00 | loss scale: 65536.0 | grad norm: 37962.417 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.007 | TFLOPs: 42.99 | iteration 15802/ 292968 | consumed samples: 32362496 | consumed tokens: 16034496512 | elapsed time per iteration (ms): 126916.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.672558E+00 | loss scale: 65536.0 | grad norm: 49094.423 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.75 | iteration 15803/ 292968 | consumed samples: 32364544 | consumed tokens: 16036397056 | elapsed time per iteration (ms): 128470.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.681546E+00 | loss scale: 65536.0 | grad norm: 62922.953 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.58 | iteration 15804/ 292968 | consumed samples: 32366592 | consumed tokens: 16038297600 | elapsed time per iteration (ms): 129460.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.682196E+00 | loss scale: 65536.0 | grad norm: 56860.883 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.85 | iteration 15805/ 292968 | consumed samples: 32368640 | consumed tokens: 16040198144 | elapsed time per iteration (ms): 127492.9 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.695091E+00 | loss scale: 65536.0 | grad norm: 45020.179 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.31 | iteration 15806/ 292968 | consumed samples: 32370688 | consumed tokens: 16042098688 | elapsed time per iteration (ms): 128763.7 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.630443E+00 | loss scale: 65536.0 | grad norm: 53821.899 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.36 | iteration 15807/ 292968 | consumed samples: 32372736 | consumed tokens: 16043999232 | elapsed time per iteration (ms): 128998.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.730872E+00 | loss scale: 65536.0 | grad norm: 61105.946 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.19 | iteration 15808/ 292968 | consumed samples: 32374784 | consumed tokens: 16045899776 | elapsed time per iteration (ms): 128606.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.681544E+00 | loss scale: 65536.0 | grad norm: 57547.913 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.48 | iteration 15809/ 292968 | consumed samples: 32376832 | consumed tokens: 16047800320 | elapsed time per iteration (ms): 127393.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.670443E+00 | loss scale: 65536.0 | grad norm: 51231.631 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.38 | iteration 15810/ 292968 | consumed samples: 32378880 | consumed tokens: 16049700864 | elapsed time per iteration (ms): 131653.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.678859E+00 | loss scale: 65536.0 | grad norm: 56434.280 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 93.27 | iteration 15811/ 292968 | consumed samples: 32380928 | consumed tokens: 16051601408 | elapsed time per iteration (ms): 126039.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.675346E+00 | loss scale: 65536.0 | grad norm: 62244.311 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.42 | iteration 15812/ 292968 | consumed samples: 32382976 | consumed tokens: 16053501952 | elapsed time per iteration (ms): 129742.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.692650E+00 | loss scale: 65536.0 | grad norm: 51575.643 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.64 | iteration 15813/ 292968 | consumed samples: 32385024 | consumed tokens: 16055402496 | elapsed time per iteration (ms): 124728.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.688370E+00 | loss scale: 65536.0 | grad norm: 41795.098 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.44 | iteration 15814/ 292968 | consumed samples: 32387072 | consumed tokens: 16057303040 | elapsed time per iteration (ms): 126081.3 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.693259E+00 | loss scale: 65536.0 | grad norm: 51854.187 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.39 | iteration 15815/ 292968 | consumed samples: 32389120 | consumed tokens: 16059203584 | elapsed time per iteration (ms): 125764.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.696913E+00 | loss scale: 65536.0 | grad norm: 53419.187 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.63 | iteration 15816/ 292968 | consumed samples: 32391168 | consumed tokens: 16061104128 | elapsed time per iteration (ms): 124229.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.693575E+00 | loss scale: 65536.0 | grad norm: 57609.280 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.84 | iteration 15817/ 292968 | consumed samples: 32393216 | consumed tokens: 16063004672 | elapsed time per iteration (ms): 124962.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.673488E+00 | loss scale: 65536.0 | grad norm: 57915.032 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.26 | iteration 15818/ 292968 | consumed samples: 32395264 | consumed tokens: 16064905216 | elapsed time per iteration (ms): 124945.3 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.683271E+00 | loss scale: 65536.0 | grad norm: 52686.742 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.27 | iteration 15819/ 292968 | consumed samples: 32397312 | consumed tokens: 16066805760 | elapsed time per iteration (ms): 125469.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.686437E+00 | loss scale: 65536.0 | grad norm: 37966.720 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.86 | iteration 15820/ 292968 | consumed samples: 32399360 | consumed tokens: 16068706304 | elapsed time per iteration (ms): 123683.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.688923E+00 | loss scale: 65536.0 | grad norm: 42463.733 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.28 | iteration 15821/ 292968 | consumed samples: 32401408 | consumed tokens: 16070606848 | elapsed time per iteration (ms): 124423.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.675535E+00 | loss scale: 65536.0 | grad norm: 61820.269 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.69 | iteration 15822/ 292968 | consumed samples: 32403456 | consumed tokens: 16072507392 | elapsed time per iteration (ms): 123529.9 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.674944E+00 | loss scale: 65536.0 | grad norm: 66284.529 | num zeros: 0.0 | curriculum seqlen: 928 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.40 | iteration 15823/ 292968 | consumed samples: 32405504 | consumed tokens: 16074424320 | elapsed time per iteration (ms): 123636.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.688670E+00 | loss scale: 65536.0 | grad norm: 57062.251 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.17 | iteration 15824/ 292968 | consumed samples: 32407552 | consumed tokens: 16076341248 | elapsed time per iteration (ms): 126296.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.676324E+00 | loss scale: 65536.0 | grad norm: 45436.221 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.06 | iteration 15825/ 292968 | consumed samples: 32409600 | consumed tokens: 16078258176 | elapsed time per iteration (ms): 126443.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.677030E+00 | loss scale: 65536.0 | grad norm: 43940.007 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.95 | iteration 15826/ 292968 | consumed samples: 32411648 | consumed tokens: 16080175104 | elapsed time per iteration (ms): 125460.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.689443E+00 | loss scale: 65536.0 | grad norm: 41642.272 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.71 | iteration 15827/ 292968 | consumed samples: 32413696 | consumed tokens: 16082092032 | elapsed time per iteration (ms): 129948.9 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.663138E+00 | loss scale: 65536.0 | grad norm: 41771.393 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.30 | iteration 15828/ 292968 | consumed samples: 32415744 | consumed tokens: 16084008960 | elapsed time per iteration (ms): 126697.4 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.667101E+00 | loss scale: 65536.0 | grad norm: 43006.467 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.75 | iteration 15829/ 292968 | consumed samples: 32417792 | consumed tokens: 16085925888 | elapsed time per iteration (ms): 126588.0 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.702423E+00 | loss scale: 65536.0 | grad norm: 44727.673 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.83 | iteration 15830/ 292968 | consumed samples: 32419840 | consumed tokens: 16087842816 | elapsed time per iteration (ms): 124108.7 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.672921E+00 | loss scale: 65536.0 | grad norm: 55994.964 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.79 | iteration 15831/ 292968 | consumed samples: 32421888 | consumed tokens: 16089759744 | elapsed time per iteration (ms): 123929.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.711230E+00 | loss scale: 65536.0 | grad norm: 64589.649 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.93 | iteration 15832/ 292968 | consumed samples: 32423936 | consumed tokens: 16091676672 | elapsed time per iteration (ms): 124994.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.704228E+00 | loss scale: 65536.0 | grad norm: 67597.057 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.08 | iteration 15833/ 292968 | consumed samples: 32425984 | consumed tokens: 16093593600 | elapsed time per iteration (ms): 125236.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.675212E+00 | loss scale: 65536.0 | grad norm: 69961.415 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.89 | iteration 15834/ 292968 | consumed samples: 32428032 | consumed tokens: 16095510528 | elapsed time per iteration (ms): 127034.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.685445E+00 | loss scale: 65536.0 | grad norm: 35618.413 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.49 | iteration 15835/ 292968 | consumed samples: 32430080 | consumed tokens: 16097427456 | elapsed time per iteration (ms): 124837.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.695641E+00 | loss scale: 65536.0 | grad norm: 49913.494 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.21 | iteration 15836/ 292968 | consumed samples: 32432128 | consumed tokens: 16099344384 | elapsed time per iteration (ms): 125060.7 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.677601E+00 | loss scale: 65536.0 | grad norm: 68847.165 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.03 | iteration 15837/ 292968 | consumed samples: 32434176 | consumed tokens: 16101261312 | elapsed time per iteration (ms): 125535.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.685973E+00 | loss scale: 65536.0 | grad norm: 44298.395 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.65 | iteration 15838/ 292968 | consumed samples: 32436224 | consumed tokens: 16103178240 | elapsed time per iteration (ms): 124139.9 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.685821E+00 | loss scale: 65536.0 | grad norm: 40130.433 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.76 | iteration 15839/ 292968 | consumed samples: 32438272 | consumed tokens: 16105095168 | elapsed time per iteration (ms): 129677.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.697963E+00 | loss scale: 65536.0 | grad norm: 53752.536 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.50 | iteration 15840/ 292968 | consumed samples: 32440320 | consumed tokens: 16107012096 | elapsed time per iteration (ms): 125731.0 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.687264E+00 | loss scale: 65536.0 | grad norm: 49533.791 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.50 | iteration 15841/ 292968 | consumed samples: 32442368 | consumed tokens: 16108929024 | elapsed time per iteration (ms): 124743.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.679829E+00 | loss scale: 65536.0 | grad norm: 46076.824 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.28 | iteration 15842/ 292968 | consumed samples: 32444416 | consumed tokens: 16110845952 | elapsed time per iteration (ms): 125277.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.658489E+00 | loss scale: 65536.0 | grad norm: 53886.959 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.86 | iteration 15843/ 292968 | consumed samples: 32446464 | consumed tokens: 16112762880 | elapsed time per iteration (ms): 124281.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.666461E+00 | loss scale: 65536.0 | grad norm: 68354.751 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.65 | iteration 15844/ 292968 | consumed samples: 32448512 | consumed tokens: 16114679808 | elapsed time per iteration (ms): 129215.3 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.676160E+00 | loss scale: 65536.0 | grad norm: 49856.107 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.85 | iteration 15845/ 292968 | consumed samples: 32450560 | consumed tokens: 16116596736 | elapsed time per iteration (ms): 129199.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.705413E+00 | loss scale: 65536.0 | grad norm: 46215.275 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.86 | iteration 15846/ 292968 | consumed samples: 32452608 | consumed tokens: 16118513664 | elapsed time per iteration (ms): 128347.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.697992E+00 | loss scale: 65536.0 | grad norm: 61654.463 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.49 | iteration 15847/ 292968 | consumed samples: 32454656 | consumed tokens: 16120430592 | elapsed time per iteration (ms): 128158.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.702692E+00 | loss scale: 65536.0 | grad norm: 79893.835 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.64 | iteration 15848/ 292968 | consumed samples: 32456704 | consumed tokens: 16122347520 | elapsed time per iteration (ms): 126336.4 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.681474E+00 | loss scale: 65536.0 | grad norm: 33208.891 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.03 | iteration 15849/ 292968 | consumed samples: 32458752 | consumed tokens: 16124264448 | elapsed time per iteration (ms): 126746.0 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.693415E+00 | loss scale: 65536.0 | grad norm: 62330.701 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.71 | iteration 15850/ 292968 | consumed samples: 32460800 | consumed tokens: 16126181376 | elapsed time per iteration (ms): 125772.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.678526E+00 | loss scale: 65536.0 | grad norm: 76741.462 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.47 | saving checkpoint at iteration 15850 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-28 04:38:07,383] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/mp_rank_00_model_states.pt [2022-01-28 04:38:07,601] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/mp_rank_01_model_states.pt [2022-01-28 04:39:25,107] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-28 04:39:28,540] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-28 04:39:28,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-28 04:39:29,452] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-28 04:39:29,637] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-28 04:39:29,750] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-28 04:39:29,922] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-28 04:39:30,278] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-28 04:39:30,413] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-28 04:39:30,432] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-28 04:39:30,500] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-28 04:39:30,534] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-28 04:39:30,639] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-28 04:39:30,655] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-28 04:39:30,666] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-28 04:39:30,705] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-28 04:39:30,746] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-28 04:39:30,832] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-28 04:39:30,987] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-28 04:39:31,150] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-28 04:39:31,385] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-28 04:39:31,442] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-28 04:39:31,465] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-28 04:39:31,559] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-28 04:39:31,715] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-28 04:39:31,758] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-28 04:39:31,811] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-28 04:39:31,944] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-28 04:39:31,953] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-28 04:39:31,980] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-28 04:39:32,078] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-28 04:39:32,390] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-28 04:39:32,384] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-28 04:39:32,389] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-28 04:39:32,469] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-28 04:39:32,483] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-28 04:39:32,495] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-28 04:39:32,653] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-28 04:39:32,661] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-28 04:39:32,630] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-28 04:39:32,655] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-28 04:39:32,756] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-28 04:39:33,077] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-28 04:39:33,461] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-28 04:39:33,492] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-28 04:39:33,704] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-28 04:39:33,727] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-28 04:39:33,806] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-28 04:39:33,942] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-28 04:39:33,954] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-28 04:39:34,202] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-28 04:39:34,225] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-28 04:39:34,243] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-28 04:39:34,283] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-28 04:39:34,357] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-28 04:39:34,396] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-28 04:39:34,404] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-28 04:39:34,421] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-28 04:39:34,441] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-28 04:39:34,483] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-28 04:39:34,731] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-28 04:39:34,797] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-28 04:39:34,797] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-28 04:39:34,820] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-28 04:39:34,925] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-28 04:39:34,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-28 04:39:34,996] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-28 04:39:35,164] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-28 04:39:35,173] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-28 04:39:35,220] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-28 04:39:35,117] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-28 04:39:35,245] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-28 04:39:35,266] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-28 04:39:35,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-28 04:39:35,864] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-28 04:39:36,031] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-28 04:39:35,977] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-28 04:39:36,667] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-28 04:39:37,022] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-28 04:39:36,960] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-28 04:39:37,072] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-28 04:39:37,106] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-28 04:39:37,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-28 04:39:37,562] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-28 04:39:37,671] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-28 04:39:37,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-28 04:39:37,691] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-28 04:39:37,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-28 04:39:37,904] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-28 04:39:37,978] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-28 04:39:37,996] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-28 04:39:38,021] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-28 04:39:38,080] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-28 04:39:38,082] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-28 04:39:38,319] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-28 04:39:38,404] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-28 04:39:38,525] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-28 04:39:38,637] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-28 04:39:38,681] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-28 04:39:39,115] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-28 04:39:39,139] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-28 04:39:39,198] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-28 04:39:39,435] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-28 04:39:39,500] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-28 04:39:39,515] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-28 04:39:39,560] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-28 04:39:39,641] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-28 04:39:40,566] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-28 04:39:40,570] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-28 04:39:40,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-28 04:39:40,818] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-28 04:39:40,888] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-28 04:39:40,898] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-28 04:39:40,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-28 04:39:41,095] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-28 04:39:41,123] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-28 04:39:41,302] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-28 04:39:41,383] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-28 04:39:41,935] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-28 04:39:42,082] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-28 04:39:42,413] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-28 04:39:42,480] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-28 04:39:44,819] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-28 04:39:44,919] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-28 04:39:49,537] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-28 04:39:49,946] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-28 04:39:52,494] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-28 04:39:52,594] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15850/zero_pp_rank_0_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 15850 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 122858.21 iteration 15851/ 292968 | consumed samples: 32462848 | consumed tokens: 16128098304 | elapsed time per iteration (ms): 250738.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.696983E+00 | loss scale: 65536.0 | grad norm: 36333.420 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.008 | TFLOPs: 49.39 | iteration 15852/ 292968 | consumed samples: 32464896 | consumed tokens: 16130015232 | elapsed time per iteration (ms): 127706.0 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.691415E+00 | loss scale: 65536.0 | grad norm: 106990.608 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.98 | iteration 15853/ 292968 | consumed samples: 32466944 | consumed tokens: 16131932160 | elapsed time per iteration (ms): 124727.4 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.669462E+00 | loss scale: 65536.0 | grad norm: 45407.415 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.29 | iteration 15854/ 292968 | consumed samples: 32468992 | consumed tokens: 16133849088 | elapsed time per iteration (ms): 123363.1 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.682419E+00 | loss scale: 65536.0 | grad norm: 87642.734 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.39 | iteration 15855/ 292968 | consumed samples: 32471040 | consumed tokens: 16135766016 | elapsed time per iteration (ms): 124022.9 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.678237E+00 | loss scale: 65536.0 | grad norm: 28612.057 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.86 | iteration 15856/ 292968 | consumed samples: 32473088 | consumed tokens: 16137682944 | elapsed time per iteration (ms): 123958.0 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.697297E+00 | loss scale: 65536.0 | grad norm: 56080.897 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.91 | iteration 15857/ 292968 | consumed samples: 32475136 | consumed tokens: 16139599872 | elapsed time per iteration (ms): 125098.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.691202E+00 | loss scale: 65536.0 | grad norm: 71191.409 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.00 | iteration 15858/ 292968 | consumed samples: 32477184 | consumed tokens: 16141516800 | elapsed time per iteration (ms): 124411.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.667895E+00 | loss scale: 65536.0 | grad norm: 57131.015 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.55 | iteration 15859/ 292968 | consumed samples: 32479232 | consumed tokens: 16143433728 | elapsed time per iteration (ms): 123558.0 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.659772E+00 | loss scale: 65536.0 | grad norm: 56646.604 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.23 | iteration 15860/ 292968 | consumed samples: 32481280 | consumed tokens: 16145350656 | elapsed time per iteration (ms): 129587.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.699534E+00 | loss scale: 65536.0 | grad norm: 76799.306 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.57 | iteration 15861/ 292968 | consumed samples: 32483328 | consumed tokens: 16147267584 | elapsed time per iteration (ms): 124157.3 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.683677E+00 | loss scale: 65536.0 | grad norm: 75812.296 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.75 | iteration 15862/ 292968 | consumed samples: 32485376 | consumed tokens: 16149184512 | elapsed time per iteration (ms): 123041.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.690855E+00 | loss scale: 65536.0 | grad norm: 58026.110 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.65 | iteration 15863/ 292968 | consumed samples: 32487424 | consumed tokens: 16151101440 | elapsed time per iteration (ms): 123796.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.681226E+00 | loss scale: 65536.0 | grad norm: 59295.999 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.04 | iteration 15864/ 292968 | consumed samples: 32489472 | consumed tokens: 16153018368 | elapsed time per iteration (ms): 124597.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.693608E+00 | loss scale: 65536.0 | grad norm: 66044.879 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.40 | iteration 15865/ 292968 | consumed samples: 32491520 | consumed tokens: 16154935296 | elapsed time per iteration (ms): 122303.3 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.694866E+00 | loss scale: 65536.0 | grad norm: 86725.662 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.26 | iteration 15866/ 292968 | consumed samples: 32493568 | consumed tokens: 16156852224 | elapsed time per iteration (ms): 124274.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.709119E+00 | loss scale: 65536.0 | grad norm: 49126.089 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.66 | iteration 15867/ 292968 | consumed samples: 32495616 | consumed tokens: 16158769152 | elapsed time per iteration (ms): 124976.7 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.711809E+00 | loss scale: 65536.0 | grad norm: 139032.522 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.10 | iteration 15868/ 292968 | consumed samples: 32497664 | consumed tokens: 16160686080 | elapsed time per iteration (ms): 122018.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.712712E+00 | loss scale: 65536.0 | grad norm: 59383.159 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.50 | iteration 15869/ 292968 | consumed samples: 32499712 | consumed tokens: 16162603008 | elapsed time per iteration (ms): 122567.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.744849E+00 | loss scale: 65536.0 | grad norm: 132640.993 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.04 | iteration 15870/ 292968 | consumed samples: 32501760 | consumed tokens: 16164519936 | elapsed time per iteration (ms): 124121.7 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.720863E+00 | loss scale: 65536.0 | grad norm: 112979.433 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.78 | iteration 15871/ 292968 | consumed samples: 32503808 | consumed tokens: 16166436864 | elapsed time per iteration (ms): 127827.7 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.714000E+00 | loss scale: 65536.0 | grad norm: 82486.850 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.89 | iteration 15872/ 292968 | consumed samples: 32505856 | consumed tokens: 16168353792 | elapsed time per iteration (ms): 123686.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.706725E+00 | loss scale: 65536.0 | grad norm: 72819.918 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.13 | iteration 15873/ 292968 | consumed samples: 32507904 | consumed tokens: 16170270720 | elapsed time per iteration (ms): 122946.2 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.668593E+00 | loss scale: 65536.0 | grad norm: 71041.520 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.73 | iteration 15874/ 292968 | consumed samples: 32509952 | consumed tokens: 16172187648 | elapsed time per iteration (ms): 123563.8 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.668697E+00 | loss scale: 65536.0 | grad norm: 50950.223 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.23 | iteration 15875/ 292968 | consumed samples: 32512000 | consumed tokens: 16174104576 | elapsed time per iteration (ms): 122600.4 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.713847E+00 | loss scale: 65536.0 | grad norm: 66681.629 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.02 | iteration 15876/ 292968 | consumed samples: 32514048 | consumed tokens: 16176021504 | elapsed time per iteration (ms): 125136.4 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.697011E+00 | loss scale: 65536.0 | grad norm: 49814.051 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.97 | iteration 15877/ 292968 | consumed samples: 32516096 | consumed tokens: 16177938432 | elapsed time per iteration (ms): 129975.3 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.720295E+00 | loss scale: 65536.0 | grad norm: 59270.136 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.28 | iteration 15878/ 292968 | consumed samples: 32518144 | consumed tokens: 16179855360 | elapsed time per iteration (ms): 123948.0 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.691283E+00 | loss scale: 65536.0 | grad norm: 67869.523 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.92 | iteration 15879/ 292968 | consumed samples: 32520192 | consumed tokens: 16181772288 | elapsed time per iteration (ms): 123948.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.694281E+00 | loss scale: 65536.0 | grad norm: 52483.765 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.92 | iteration 15880/ 292968 | consumed samples: 32522240 | consumed tokens: 16183689216 | elapsed time per iteration (ms): 123950.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.688785E+00 | loss scale: 65536.0 | grad norm: 37712.983 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.92 | iteration 15881/ 292968 | consumed samples: 32524288 | consumed tokens: 16185606144 | elapsed time per iteration (ms): 124389.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.702471E+00 | loss scale: 65536.0 | grad norm: 37421.478 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.56 | iteration 15882/ 292968 | consumed samples: 32526336 | consumed tokens: 16187523072 | elapsed time per iteration (ms): 126772.6 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.656312E+00 | loss scale: 65536.0 | grad norm: 39117.626 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.69 | iteration 15883/ 292968 | consumed samples: 32528384 | consumed tokens: 16189440000 | elapsed time per iteration (ms): 125628.5 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.709462E+00 | loss scale: 65536.0 | grad norm: 34186.558 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.58 | iteration 15884/ 292968 | consumed samples: 32530432 | consumed tokens: 16191356928 | elapsed time per iteration (ms): 124961.4 | learning rate: 5.949E-05 | global batch size: 2048 | lm loss: 2.674695E+00 | loss scale: 65536.0 | grad norm: 39242.570 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.11 | iteration 15885/ 292968 | consumed samples: 32532480 | consumed tokens: 16193273856 | elapsed time per iteration (ms): 130402.9 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.672719E+00 | loss scale: 65536.0 | grad norm: 41665.971 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.97 | iteration 15886/ 292968 | consumed samples: 32534528 | consumed tokens: 16195190784 | elapsed time per iteration (ms): 124705.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.702154E+00 | loss scale: 65536.0 | grad norm: 34156.037 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.31 | iteration 15887/ 292968 | consumed samples: 32536576 | consumed tokens: 16197107712 | elapsed time per iteration (ms): 124779.1 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.665346E+00 | loss scale: 65536.0 | grad norm: 41119.469 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.25 | iteration 15888/ 292968 | consumed samples: 32538624 | consumed tokens: 16199024640 | elapsed time per iteration (ms): 125312.2 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.692655E+00 | loss scale: 65536.0 | grad norm: 56459.062 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.83 | iteration 15889/ 292968 | consumed samples: 32540672 | consumed tokens: 16200941568 | elapsed time per iteration (ms): 123222.9 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.680012E+00 | loss scale: 65536.0 | grad norm: 72459.406 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.51 | iteration 15890/ 292968 | consumed samples: 32542720 | consumed tokens: 16202858496 | elapsed time per iteration (ms): 123740.1 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.675801E+00 | loss scale: 65536.0 | grad norm: 49974.209 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.09 | iteration 15891/ 292968 | consumed samples: 32544768 | consumed tokens: 16204775424 | elapsed time per iteration (ms): 123206.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.683560E+00 | loss scale: 65536.0 | grad norm: 55026.056 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.52 | iteration 15892/ 292968 | consumed samples: 32546816 | consumed tokens: 16206692352 | elapsed time per iteration (ms): 122749.2 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.668155E+00 | loss scale: 65536.0 | grad norm: 76112.871 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.89 | iteration 15893/ 292968 | consumed samples: 32548864 | consumed tokens: 16208609280 | elapsed time per iteration (ms): 122313.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.708838E+00 | loss scale: 65536.0 | grad norm: 57943.848 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.25 | iteration 15894/ 292968 | consumed samples: 32550912 | consumed tokens: 16210526208 | elapsed time per iteration (ms): 130083.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.697397E+00 | loss scale: 65536.0 | grad norm: 53602.253 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.21 | iteration 15895/ 292968 | consumed samples: 32552960 | consumed tokens: 16212443136 | elapsed time per iteration (ms): 123256.9 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.679707E+00 | loss scale: 65536.0 | grad norm: 55416.788 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.48 | iteration 15896/ 292968 | consumed samples: 32555008 | consumed tokens: 16214360064 | elapsed time per iteration (ms): 123913.5 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.694247E+00 | loss scale: 65536.0 | grad norm: 51304.370 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.95 | iteration 15897/ 292968 | consumed samples: 32557056 | consumed tokens: 16216276992 | elapsed time per iteration (ms): 123249.4 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.682677E+00 | loss scale: 65536.0 | grad norm: 42697.663 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.48 | iteration 15898/ 292968 | consumed samples: 32559104 | consumed tokens: 16218193920 | elapsed time per iteration (ms): 121921.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.682737E+00 | loss scale: 65536.0 | grad norm: 41362.589 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.58 | iteration 15899/ 292968 | consumed samples: 32561152 | consumed tokens: 16220110848 | elapsed time per iteration (ms): 121590.4 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.689842E+00 | loss scale: 65536.0 | grad norm: 46996.173 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.86 | iteration 15900/ 292968 | consumed samples: 32563200 | consumed tokens: 16222027776 | elapsed time per iteration (ms): 122942.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.684104E+00 | loss scale: 65536.0 | grad norm: 47381.631 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.74 | ------------------------------------------------------------------------------------------- valid loss at iteration 15900 | lm loss value: 3.312987E+00 | lm loss PPL: 2.746704E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 15900 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-28 06:30:10,175] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/mp_rank_01_model_states.pt [2022-01-28 06:30:10,184] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/mp_rank_00_model_states.pt [2022-01-28 06:30:58,605] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-28 06:30:59,615] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-28 06:31:00,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-28 06:31:01,233] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-28 06:31:01,455] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-28 06:31:02,346] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-28 06:31:02,428] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-28 06:31:02,452] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-28 06:31:02,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-28 06:31:02,641] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-28 06:31:02,659] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-28 06:31:02,666] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-28 06:31:02,741] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-28 06:31:02,781] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-28 06:31:02,829] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-28 06:31:02,954] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-28 06:31:02,969] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-28 06:31:03,049] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-28 06:31:03,045] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-28 06:31:03,100] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-28 06:31:03,329] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-28 06:31:03,378] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-28 06:31:03,762] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-28 06:31:03,941] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-28 06:31:03,956] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-28 06:31:04,025] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-28 06:31:04,135] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-28 06:31:04,142] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-28 06:31:04,350] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-28 06:31:04,574] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-28 06:31:04,615] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-28 06:31:04,687] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-28 06:31:04,688] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-28 06:31:04,757] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-28 06:31:05,002] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-28 06:31:05,142] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-28 06:31:05,160] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-28 06:31:05,180] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-28 06:31:05,509] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-28 06:31:05,567] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-28 06:31:05,618] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-28 06:31:05,657] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-28 06:31:05,845] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-28 06:31:05,865] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-28 06:31:06,020] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-28 06:31:06,387] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-28 06:31:06,707] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-28 06:31:06,901] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-28 06:31:06,933] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-28 06:31:07,091] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-28 06:31:07,099] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-28 06:31:07,157] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-28 06:31:07,214] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-28 06:31:07,355] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-28 06:31:07,579] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-28 06:31:07,703] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-28 06:31:07,728] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-28 06:31:07,751] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-28 06:31:07,797] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-28 06:31:07,847] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-28 06:31:07,910] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-28 06:31:07,975] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-28 06:31:08,170] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-28 06:31:08,338] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-28 06:31:08,366] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-28 06:31:08,368] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-28 06:31:08,391] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-28 06:31:08,491] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-28 06:31:09,080] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-28 06:31:09,174] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-28 06:31:09,178] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-28 06:31:09,260] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-28 06:31:09,272] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-28 06:31:09,324] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-28 06:31:09,584] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-28 06:31:09,596] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-28 06:31:09,857] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-28 06:31:10,134] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-28 06:31:10,374] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-28 06:31:10,485] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-28 06:31:10,592] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-28 06:31:10,611] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-28 06:31:10,708] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-28 06:31:10,741] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-28 06:31:11,108] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-28 06:31:11,136] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-28 06:31:11,849] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-28 06:31:12,084] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-28 06:31:12,206] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-28 06:31:12,061] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-28 06:31:12,234] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-28 06:31:12,245] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-28 06:31:12,711] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-28 06:31:12,754] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-28 06:31:12,785] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-28 06:31:13,546] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-28 06:31:13,619] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-28 06:31:13,704] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-28 06:31:13,601] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-28 06:31:13,752] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-28 06:31:13,844] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-28 06:31:14,076] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-28 06:31:13,923] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-28 06:31:14,206] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-28 06:31:14,382] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-28 06:31:14,474] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-28 06:31:14,510] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-28 06:31:14,581] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-28 06:31:14,587] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-28 06:31:14,635] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-28 06:31:14,668] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-28 06:31:14,712] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-28 06:31:15,138] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-28 06:31:15,402] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-28 06:31:15,457] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-28 06:31:15,459] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-28 06:31:15,629] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-28 06:31:15,687] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-28 06:31:15,750] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-28 06:31:16,012] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-28 06:31:16,100] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-28 06:31:16,168] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-28 06:31:16,672] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-28 06:31:16,935] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-28 06:31:16,957] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-28 06:31:16,982] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-28 06:31:17,497] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-28 06:31:17,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15900/zero_pp_rank_0_mp_rank_125_optim_states.pt successfully saved checkpoint at iteration 15900 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 79909.94 iteration 15901/ 292968 | consumed samples: 32565248 | consumed tokens: 16223944704 | elapsed time per iteration (ms): 579396.4 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.678542E+00 | loss scale: 65536.0 | grad norm: 47551.015 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.004 | TFLOPs: 21.38 | iteration 15902/ 292968 | consumed samples: 32567296 | consumed tokens: 16225861632 | elapsed time per iteration (ms): 120862.4 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.700056E+00 | loss scale: 65536.0 | grad norm: 44675.374 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.47 | iteration 15903/ 292968 | consumed samples: 32569344 | consumed tokens: 16227778560 | elapsed time per iteration (ms): 120762.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.705688E+00 | loss scale: 65536.0 | grad norm: 42982.501 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.55 | iteration 15904/ 292968 | consumed samples: 32571392 | consumed tokens: 16229695488 | elapsed time per iteration (ms): 119706.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.689088E+00 | loss scale: 65536.0 | grad norm: 43410.905 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 103.46 | iteration 15905/ 292968 | consumed samples: 32573440 | consumed tokens: 16231612416 | elapsed time per iteration (ms): 120333.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.694792E+00 | loss scale: 65536.0 | grad norm: 50699.531 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.92 | iteration 15906/ 292968 | consumed samples: 32575488 | consumed tokens: 16233529344 | elapsed time per iteration (ms): 120733.4 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.690062E+00 | loss scale: 65536.0 | grad norm: 58349.060 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.58 | iteration 15907/ 292968 | consumed samples: 32577536 | consumed tokens: 16235446272 | elapsed time per iteration (ms): 121625.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.706563E+00 | loss scale: 65536.0 | grad norm: 69679.960 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.83 | iteration 15908/ 292968 | consumed samples: 32579584 | consumed tokens: 16237363200 | elapsed time per iteration (ms): 123014.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.670264E+00 | loss scale: 65536.0 | grad norm: 56093.133 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.68 | iteration 15909/ 292968 | consumed samples: 32581632 | consumed tokens: 16239280128 | elapsed time per iteration (ms): 130636.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.701972E+00 | loss scale: 65536.0 | grad norm: 57095.088 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.80 | iteration 15910/ 292968 | consumed samples: 32583680 | consumed tokens: 16241197056 | elapsed time per iteration (ms): 126250.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.692456E+00 | loss scale: 65536.0 | grad norm: 53595.110 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.10 | iteration 15911/ 292968 | consumed samples: 32585728 | consumed tokens: 16243113984 | elapsed time per iteration (ms): 124488.1 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.666867E+00 | loss scale: 65536.0 | grad norm: 38773.491 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.48 | iteration 15912/ 292968 | consumed samples: 32587776 | consumed tokens: 16245030912 | elapsed time per iteration (ms): 124391.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.683891E+00 | loss scale: 65536.0 | grad norm: 44187.393 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.56 | iteration 15913/ 292968 | consumed samples: 32589824 | consumed tokens: 16246947840 | elapsed time per iteration (ms): 123843.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.684299E+00 | loss scale: 65536.0 | grad norm: 48584.532 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.00 | iteration 15914/ 292968 | consumed samples: 32591872 | consumed tokens: 16248864768 | elapsed time per iteration (ms): 124126.5 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.660931E+00 | loss scale: 65536.0 | grad norm: 52751.980 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.77 | iteration 15915/ 292968 | consumed samples: 32593920 | consumed tokens: 16250781696 | elapsed time per iteration (ms): 124379.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.653254E+00 | loss scale: 65536.0 | grad norm: 56313.304 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.57 | iteration 15916/ 292968 | consumed samples: 32595968 | consumed tokens: 16252698624 | elapsed time per iteration (ms): 125580.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.662419E+00 | loss scale: 65536.0 | grad norm: 55502.204 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.62 | iteration 15917/ 292968 | consumed samples: 32598016 | consumed tokens: 16254615552 | elapsed time per iteration (ms): 125998.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.672918E+00 | loss scale: 65536.0 | grad norm: 62430.454 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.29 | iteration 15918/ 292968 | consumed samples: 32600064 | consumed tokens: 16256532480 | elapsed time per iteration (ms): 124725.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.690456E+00 | loss scale: 65536.0 | grad norm: 61215.447 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.30 | iteration 15919/ 292968 | consumed samples: 32602112 | consumed tokens: 16258449408 | elapsed time per iteration (ms): 124347.5 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.687574E+00 | loss scale: 65536.0 | grad norm: 57220.961 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.60 | iteration 15920/ 292968 | consumed samples: 32604160 | consumed tokens: 16260366336 | elapsed time per iteration (ms): 126442.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.698216E+00 | loss scale: 65536.0 | grad norm: 55787.748 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.95 | iteration 15921/ 292968 | consumed samples: 32606208 | consumed tokens: 16262283264 | elapsed time per iteration (ms): 125010.4 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.705619E+00 | loss scale: 65536.0 | grad norm: 41874.976 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.07 | iteration 15922/ 292968 | consumed samples: 32608256 | consumed tokens: 16264200192 | elapsed time per iteration (ms): 127691.2 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.685787E+00 | loss scale: 65536.0 | grad norm: 44361.721 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.99 | iteration 15923/ 292968 | consumed samples: 32610304 | consumed tokens: 16266117120 | elapsed time per iteration (ms): 126189.1 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.675792E+00 | loss scale: 65536.0 | grad norm: 47550.316 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.14 | iteration 15924/ 292968 | consumed samples: 32612352 | consumed tokens: 16268034048 | elapsed time per iteration (ms): 126064.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.681153E+00 | loss scale: 65536.0 | grad norm: 44717.039 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.24 | iteration 15925/ 292968 | consumed samples: 32614400 | consumed tokens: 16269950976 | elapsed time per iteration (ms): 129211.1 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.688752E+00 | loss scale: 65536.0 | grad norm: 51002.903 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.85 | iteration 15926/ 292968 | consumed samples: 32616448 | consumed tokens: 16271867904 | elapsed time per iteration (ms): 138373.9 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.685332E+00 | loss scale: 65536.0 | grad norm: 54644.451 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.50 | iteration 15927/ 292968 | consumed samples: 32618496 | consumed tokens: 16273784832 | elapsed time per iteration (ms): 127034.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.680095E+00 | loss scale: 65536.0 | grad norm: 51976.833 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.49 | iteration 15928/ 292968 | consumed samples: 32620544 | consumed tokens: 16275701760 | elapsed time per iteration (ms): 126251.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.695301E+00 | loss scale: 65536.0 | grad norm: 51951.006 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.10 | iteration 15929/ 292968 | consumed samples: 32622592 | consumed tokens: 16277618688 | elapsed time per iteration (ms): 123863.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.713949E+00 | loss scale: 65536.0 | grad norm: 65651.850 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.99 | iteration 15930/ 292968 | consumed samples: 32624640 | consumed tokens: 16279535616 | elapsed time per iteration (ms): 125662.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.681840E+00 | loss scale: 65536.0 | grad norm: 78143.337 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.55 | iteration 15931/ 292968 | consumed samples: 32626688 | consumed tokens: 16281452544 | elapsed time per iteration (ms): 122628.5 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.688161E+00 | loss scale: 65536.0 | grad norm: 44133.428 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.99 | iteration 15932/ 292968 | consumed samples: 32628736 | consumed tokens: 16283369472 | elapsed time per iteration (ms): 123024.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.671953E+00 | loss scale: 65536.0 | grad norm: 42101.644 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.67 | iteration 15933/ 292968 | consumed samples: 32630784 | consumed tokens: 16285286400 | elapsed time per iteration (ms): 124528.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.703274E+00 | loss scale: 65536.0 | grad norm: 55715.304 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.45 | iteration 15934/ 292968 | consumed samples: 32632832 | consumed tokens: 16287203328 | elapsed time per iteration (ms): 122722.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.700817E+00 | loss scale: 65536.0 | grad norm: 62413.326 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.92 | iteration 15935/ 292968 | consumed samples: 32634880 | consumed tokens: 16289120256 | elapsed time per iteration (ms): 124079.5 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.685281E+00 | loss scale: 65536.0 | grad norm: 50723.256 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.81 | iteration 15936/ 292968 | consumed samples: 32636928 | consumed tokens: 16291037184 | elapsed time per iteration (ms): 122685.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.671715E+00 | loss scale: 65536.0 | grad norm: 56774.086 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.95 | iteration 15937/ 292968 | consumed samples: 32638976 | consumed tokens: 16292954112 | elapsed time per iteration (ms): 123173.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.682549E+00 | loss scale: 65536.0 | grad norm: 72192.537 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.55 | iteration 15938/ 292968 | consumed samples: 32641024 | consumed tokens: 16294871040 | elapsed time per iteration (ms): 122783.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.657823E+00 | loss scale: 65536.0 | grad norm: 51299.894 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.87 | iteration 15939/ 292968 | consumed samples: 32643072 | consumed tokens: 16296787968 | elapsed time per iteration (ms): 124532.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.697783E+00 | loss scale: 65536.0 | grad norm: 58888.808 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.45 | iteration 15940/ 292968 | consumed samples: 32645120 | consumed tokens: 16298704896 | elapsed time per iteration (ms): 122745.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.653794E+00 | loss scale: 65536.0 | grad norm: 71052.121 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.90 | iteration 15941/ 292968 | consumed samples: 32647168 | consumed tokens: 16300621824 | elapsed time per iteration (ms): 123511.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.689450E+00 | loss scale: 65536.0 | grad norm: 60213.108 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.27 | iteration 15942/ 292968 | consumed samples: 32649216 | consumed tokens: 16302538752 | elapsed time per iteration (ms): 123275.9 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.669480E+00 | loss scale: 65536.0 | grad norm: 52934.012 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.46 | iteration 15943/ 292968 | consumed samples: 32651264 | consumed tokens: 16304455680 | elapsed time per iteration (ms): 123062.1 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.694126E+00 | loss scale: 65536.0 | grad norm: 31113.965 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.64 | iteration 15944/ 292968 | consumed samples: 32653312 | consumed tokens: 16306372608 | elapsed time per iteration (ms): 126784.5 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.673040E+00 | loss scale: 65536.0 | grad norm: 39968.707 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.68 | iteration 15945/ 292968 | consumed samples: 32655360 | consumed tokens: 16308289536 | elapsed time per iteration (ms): 126432.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.687227E+00 | loss scale: 65536.0 | grad norm: 50432.540 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.96 | iteration 15946/ 292968 | consumed samples: 32657408 | consumed tokens: 16310206464 | elapsed time per iteration (ms): 127070.2 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.722465E+00 | loss scale: 65536.0 | grad norm: 55468.971 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.46 | iteration 15947/ 292968 | consumed samples: 32659456 | consumed tokens: 16312123392 | elapsed time per iteration (ms): 125913.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.697623E+00 | loss scale: 65536.0 | grad norm: 57685.995 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.36 | iteration 15948/ 292968 | consumed samples: 32661504 | consumed tokens: 16314040320 | elapsed time per iteration (ms): 126783.2 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.691214E+00 | loss scale: 65536.0 | grad norm: 73432.745 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.68 | iteration 15949/ 292968 | consumed samples: 32663552 | consumed tokens: 16315957248 | elapsed time per iteration (ms): 127172.2 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.686218E+00 | loss scale: 65536.0 | grad norm: 60892.863 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.39 | iteration 15950/ 292968 | consumed samples: 32665600 | consumed tokens: 16317874176 | elapsed time per iteration (ms): 123484.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.687557E+00 | loss scale: 65536.0 | grad norm: 51934.142 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.29 | saving checkpoint at iteration 15950 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-28 08:15:17,696] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/mp_rank_00_model_states.pt [2022-01-28 08:15:17,897] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/mp_rank_01_model_states.pt [2022-01-28 08:15:39,989] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-28 08:15:40,072] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-28 08:15:40,824] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-28 08:15:41,529] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-28 08:15:42,539] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-28 08:15:43,134] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-28 08:15:43,262] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-28 08:15:43,395] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-28 08:15:43,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-28 08:15:43,683] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-28 08:15:43,858] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-28 08:15:43,910] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-28 08:15:43,937] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-28 08:15:43,972] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-28 08:15:44,196] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-28 08:15:44,195] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-28 08:15:44,208] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-28 08:15:44,804] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-28 08:15:44,817] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-28 08:15:44,876] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-28 08:15:45,036] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-28 08:15:45,146] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-28 08:15:45,240] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-28 08:15:45,199] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-28 08:15:45,387] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-28 08:15:45,431] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-28 08:15:45,442] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-28 08:15:45,468] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-28 08:15:45,498] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-28 08:15:45,500] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-28 08:15:45,533] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-28 08:15:45,697] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-28 08:15:45,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-28 08:15:45,838] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-28 08:15:45,842] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-28 08:15:45,853] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-28 08:15:45,969] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-28 08:15:45,985] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-28 08:15:46,013] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-28 08:15:46,074] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-28 08:15:46,155] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-28 08:15:46,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-28 08:15:46,365] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-28 08:15:46,358] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-28 08:15:46,380] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-28 08:15:46,400] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-28 08:15:46,553] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-28 08:15:46,661] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-28 08:15:46,990] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-28 08:15:47,051] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-28 08:15:47,266] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-28 08:15:47,241] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-28 08:15:47,385] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-28 08:15:47,434] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-28 08:15:47,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-28 08:15:47,647] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-28 08:15:47,840] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-28 08:15:47,923] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-28 08:15:48,008] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-28 08:15:48,151] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-28 08:15:48,176] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-28 08:15:48,267] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-28 08:15:48,308] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-28 08:15:48,369] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-28 08:15:48,613] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-28 08:15:48,662] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-28 08:15:48,829] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-28 08:15:48,831] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-28 08:15:48,835] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-28 08:15:48,857] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-28 08:15:48,951] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-28 08:15:49,002] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-28 08:15:49,082] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-28 08:15:49,087] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-28 08:15:49,210] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-28 08:15:49,237] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-28 08:15:49,243] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-28 08:15:49,242] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-28 08:15:49,363] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-28 08:15:49,352] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-28 08:15:49,496] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-28 08:15:49,489] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-28 08:15:49,528] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-28 08:15:49,633] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-28 08:15:49,717] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-28 08:15:49,728] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-28 08:15:49,718] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-28 08:15:49,750] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-01-28 08:15:49,840] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-28 08:15:49,880] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-28 08:15:50,014] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-28 08:15:50,218] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-28 08:15:50,228] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-28 08:15:50,411] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-28 08:15:50,549] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-28 08:15:50,567] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-28 08:15:50,621] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-28 08:15:50,660] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-28 08:15:50,780] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-28 08:15:50,809] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-28 08:15:50,932] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-28 08:15:51,056] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-28 08:15:51,076] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-28 08:15:51,120] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-28 08:15:51,443] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-28 08:15:51,532] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-28 08:15:51,693] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-28 08:15:51,721] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-28 08:15:52,238] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-28 08:15:53,608] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-28 08:15:53,905] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-28 08:15:54,979] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-28 08:15:55,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-28 08:15:55,205] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-28 08:15:55,364] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-28 08:15:55,380] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-28 08:15:55,848] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-28 08:15:55,977] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-28 08:15:56,276] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-28 08:15:56,561] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-28 08:15:56,725] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-28 08:15:56,748] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-28 08:15:56,883] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-28 08:15:56,958] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-28 08:15:57,071] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-28 08:15:57,323] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-28 08:15:57,388] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-28 08:15:57,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step15950/zero_pp_rank_0_mp_rank_91_optim_states.pt successfully saved checkpoint at iteration 15950 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 44711.66 iteration 15951/ 292968 | consumed samples: 32667648 | consumed tokens: 16319791104 | elapsed time per iteration (ms): 168604.9 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.686272E+00 | loss scale: 65536.0 | grad norm: 59371.132 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 73.45 | iteration 15952/ 292968 | consumed samples: 32669696 | consumed tokens: 16321708032 | elapsed time per iteration (ms): 124009.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.695672E+00 | loss scale: 65536.0 | grad norm: 73190.062 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 99.87 | iteration 15953/ 292968 | consumed samples: 32671744 | consumed tokens: 16323624960 | elapsed time per iteration (ms): 127205.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.693510E+00 | loss scale: 65536.0 | grad norm: 54350.257 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.36 | iteration 15954/ 292968 | consumed samples: 32673792 | consumed tokens: 16325541888 | elapsed time per iteration (ms): 126288.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.683270E+00 | loss scale: 65536.0 | grad norm: 38600.424 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.07 | iteration 15955/ 292968 | consumed samples: 32675840 | consumed tokens: 16327458816 | elapsed time per iteration (ms): 128621.5 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.642170E+00 | loss scale: 65536.0 | grad norm: 52646.518 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.29 | iteration 15956/ 292968 | consumed samples: 32677888 | consumed tokens: 16329375744 | elapsed time per iteration (ms): 130284.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.696537E+00 | loss scale: 65536.0 | grad norm: 68756.045 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.06 | iteration 15957/ 292968 | consumed samples: 32679936 | consumed tokens: 16331292672 | elapsed time per iteration (ms): 131027.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.686213E+00 | loss scale: 65536.0 | grad norm: 62106.029 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.52 | iteration 15958/ 292968 | consumed samples: 32681984 | consumed tokens: 16333209600 | elapsed time per iteration (ms): 135025.1 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.663595E+00 | loss scale: 65536.0 | grad norm: 69718.182 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.72 | iteration 15959/ 292968 | consumed samples: 32684032 | consumed tokens: 16335126528 | elapsed time per iteration (ms): 129478.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.661613E+00 | loss scale: 65536.0 | grad norm: 65061.356 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.65 | iteration 15960/ 292968 | consumed samples: 32686080 | consumed tokens: 16337043456 | elapsed time per iteration (ms): 130396.0 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.673151E+00 | loss scale: 65536.0 | grad norm: 42486.212 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.98 | iteration 15961/ 292968 | consumed samples: 32688128 | consumed tokens: 16338960384 | elapsed time per iteration (ms): 143239.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.682288E+00 | loss scale: 65536.0 | grad norm: 41829.666 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.46 | iteration 15962/ 292968 | consumed samples: 32690176 | consumed tokens: 16340877312 | elapsed time per iteration (ms): 131493.3 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.658368E+00 | loss scale: 65536.0 | grad norm: 71944.923 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.18 | iteration 15963/ 292968 | consumed samples: 32692224 | consumed tokens: 16342794240 | elapsed time per iteration (ms): 132599.7 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.646533E+00 | loss scale: 65536.0 | grad norm: 67420.955 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.40 | iteration 15964/ 292968 | consumed samples: 32694272 | consumed tokens: 16344711168 | elapsed time per iteration (ms): 131569.6 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.687032E+00 | loss scale: 65536.0 | grad norm: 64675.500 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.13 | iteration 15965/ 292968 | consumed samples: 32696320 | consumed tokens: 16346628096 | elapsed time per iteration (ms): 134430.8 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.701680E+00 | loss scale: 65536.0 | grad norm: 47625.105 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.13 | iteration 15966/ 292968 | consumed samples: 32698368 | consumed tokens: 16348545024 | elapsed time per iteration (ms): 134794.2 | learning rate: 5.948E-05 | global batch size: 2048 | lm loss: 2.684699E+00 | loss scale: 65536.0 | grad norm: 54288.193 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.88 | iteration 15967/ 292968 | consumed samples: 32700416 | consumed tokens: 16350461952 | elapsed time per iteration (ms): 133174.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.677641E+00 | loss scale: 65536.0 | grad norm: 66285.940 | num zeros: 0.0 | curriculum seqlen: 936 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.00 | iteration 15968/ 292968 | consumed samples: 32702464 | consumed tokens: 16352395264 | elapsed time per iteration (ms): 133026.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.684385E+00 | loss scale: 65536.0 | grad norm: 69008.484 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.89 | iteration 15969/ 292968 | consumed samples: 32704512 | consumed tokens: 16354328576 | elapsed time per iteration (ms): 132992.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.672369E+00 | loss scale: 65536.0 | grad norm: 53863.429 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.92 | iteration 15970/ 292968 | consumed samples: 32706560 | consumed tokens: 16356261888 | elapsed time per iteration (ms): 132266.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.692457E+00 | loss scale: 65536.0 | grad norm: 48178.569 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.43 | iteration 15971/ 292968 | consumed samples: 32708608 | consumed tokens: 16358195200 | elapsed time per iteration (ms): 131481.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.685313E+00 | loss scale: 65536.0 | grad norm: 56425.660 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.00 | iteration 15972/ 292968 | consumed samples: 32710656 | consumed tokens: 16360128512 | elapsed time per iteration (ms): 130088.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.687281E+00 | loss scale: 65536.0 | grad norm: 72145.588 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.02 | iteration 15973/ 292968 | consumed samples: 32712704 | consumed tokens: 16362061824 | elapsed time per iteration (ms): 130436.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.692050E+00 | loss scale: 65536.0 | grad norm: 72922.213 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.76 | iteration 15974/ 292968 | consumed samples: 32714752 | consumed tokens: 16363995136 | elapsed time per iteration (ms): 130600.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.717539E+00 | loss scale: 65536.0 | grad norm: 52753.611 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.64 | iteration 15975/ 292968 | consumed samples: 32716800 | consumed tokens: 16365928448 | elapsed time per iteration (ms): 129497.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.691189E+00 | loss scale: 65536.0 | grad norm: 62716.768 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.45 | iteration 15976/ 292968 | consumed samples: 32718848 | consumed tokens: 16367861760 | elapsed time per iteration (ms): 128149.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.688054E+00 | loss scale: 65536.0 | grad norm: 67483.611 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.47 | iteration 15977/ 292968 | consumed samples: 32720896 | consumed tokens: 16369795072 | elapsed time per iteration (ms): 137131.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.679620E+00 | loss scale: 65536.0 | grad norm: 65253.997 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.08 | iteration 15978/ 292968 | consumed samples: 32722944 | consumed tokens: 16371728384 | elapsed time per iteration (ms): 129630.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.680495E+00 | loss scale: 65536.0 | grad norm: 59561.884 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.35 | iteration 15979/ 292968 | consumed samples: 32724992 | consumed tokens: 16373661696 | elapsed time per iteration (ms): 129073.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.685628E+00 | loss scale: 65536.0 | grad norm: 48404.434 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.77 | iteration 15980/ 292968 | consumed samples: 32727040 | consumed tokens: 16375595008 | elapsed time per iteration (ms): 127277.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.687327E+00 | loss scale: 65536.0 | grad norm: 45643.448 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.14 | iteration 15981/ 292968 | consumed samples: 32729088 | consumed tokens: 16377528320 | elapsed time per iteration (ms): 128122.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.686384E+00 | loss scale: 65536.0 | grad norm: 57777.955 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.49 | iteration 15982/ 292968 | consumed samples: 32731136 | consumed tokens: 16379461632 | elapsed time per iteration (ms): 130730.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.692290E+00 | loss scale: 65536.0 | grad norm: 52443.061 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.54 | iteration 15983/ 292968 | consumed samples: 32733184 | consumed tokens: 16381394944 | elapsed time per iteration (ms): 126591.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.694472E+00 | loss scale: 65536.0 | grad norm: 43784.958 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.67 | iteration 15984/ 292968 | consumed samples: 32735232 | consumed tokens: 16383328256 | elapsed time per iteration (ms): 126692.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.668613E+00 | loss scale: 65536.0 | grad norm: 50045.685 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.59 | iteration 15985/ 292968 | consumed samples: 32737280 | consumed tokens: 16385261568 | elapsed time per iteration (ms): 124142.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.674605E+00 | loss scale: 65536.0 | grad norm: 55680.814 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.61 | iteration 15986/ 292968 | consumed samples: 32739328 | consumed tokens: 16387194880 | elapsed time per iteration (ms): 124326.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.696438E+00 | loss scale: 65536.0 | grad norm: 54741.318 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.47 | iteration 15987/ 292968 | consumed samples: 32741376 | consumed tokens: 16389128192 | elapsed time per iteration (ms): 125220.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.681134E+00 | loss scale: 65536.0 | grad norm: 61284.427 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.75 | iteration 15988/ 292968 | consumed samples: 32743424 | consumed tokens: 16391061504 | elapsed time per iteration (ms): 126283.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.676522E+00 | loss scale: 65536.0 | grad norm: 61327.496 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.91 | iteration 15989/ 292968 | consumed samples: 32745472 | consumed tokens: 16392994816 | elapsed time per iteration (ms): 124218.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.689838E+00 | loss scale: 65536.0 | grad norm: 58107.440 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.55 | iteration 15990/ 292968 | consumed samples: 32747520 | consumed tokens: 16394928128 | elapsed time per iteration (ms): 125635.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.678180E+00 | loss scale: 65536.0 | grad norm: 47572.011 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.42 | iteration 15991/ 292968 | consumed samples: 32749568 | consumed tokens: 16396861440 | elapsed time per iteration (ms): 124744.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.684251E+00 | loss scale: 65536.0 | grad norm: 55458.596 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.13 | iteration 15992/ 292968 | consumed samples: 32751616 | consumed tokens: 16398794752 | elapsed time per iteration (ms): 123993.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.663623E+00 | loss scale: 65536.0 | grad norm: 60604.964 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.74 | iteration 15993/ 292968 | consumed samples: 32753664 | consumed tokens: 16400728064 | elapsed time per iteration (ms): 123987.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.679855E+00 | loss scale: 65536.0 | grad norm: 63857.455 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.74 | iteration 15994/ 292968 | consumed samples: 32755712 | consumed tokens: 16402661376 | elapsed time per iteration (ms): 122383.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.673487E+00 | loss scale: 65536.0 | grad norm: 53287.217 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.06 | iteration 15995/ 292968 | consumed samples: 32757760 | consumed tokens: 16404594688 | elapsed time per iteration (ms): 130593.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.660632E+00 | loss scale: 65536.0 | grad norm: 54820.221 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.64 | iteration 15996/ 292968 | consumed samples: 32759808 | consumed tokens: 16406528000 | elapsed time per iteration (ms): 123624.7 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.690095E+00 | loss scale: 65536.0 | grad norm: 75750.343 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.04 | iteration 15997/ 292968 | consumed samples: 32761856 | consumed tokens: 16408461312 | elapsed time per iteration (ms): 122406.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.671646E+00 | loss scale: 65536.0 | grad norm: 41410.535 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.04 | iteration 15998/ 292968 | consumed samples: 32763904 | consumed tokens: 16410394624 | elapsed time per iteration (ms): 124432.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.674898E+00 | loss scale: 65536.0 | grad norm: 38414.866 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.38 | iteration 15999/ 292968 | consumed samples: 32765952 | consumed tokens: 16412327936 | elapsed time per iteration (ms): 123425.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.681317E+00 | loss scale: 65536.0 | grad norm: 50976.033 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.20 | [2022-01-28 10:03:11,427] [INFO] [logging.py:69:log_dist] [Rank 0] step=16000, skipped=36, lr=[5.9470819998512435e-05, 5.9470819998512435e-05], mom=[(0.9, 0.95), (0.9, 0.95)] steps: 16000 loss: 2.6756 iter time (s): 28.640 samples/sec: 71.508 iteration 16000/ 292968 | consumed samples: 32768000 | consumed tokens: 16414261248 | elapsed time per iteration (ms): 123132.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.675595E+00 | loss scale: 65536.0 | grad norm: 44097.381 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.44 | saving checkpoint at iteration 16000 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-01-28 10:03:16,866] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/mp_rank_01_model_states.pt [2022-01-28 10:03:17,043] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/mp_rank_00_model_states.pt [2022-01-28 10:03:37,910] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-01-28 10:03:38,587] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-01-28 10:03:40,398] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-01-28 10:03:40,443] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-01-28 10:03:40,459] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-01-28 10:03:40,573] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-01-28 10:03:41,815] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-01-28 10:03:41,820] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-01-28 10:03:41,873] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-01-28 10:03:41,898] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-01-28 10:03:42,041] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-01-28 10:03:42,216] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-01-28 10:03:42,415] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-01-28 10:03:42,599] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-01-28 10:03:42,661] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-01-28 10:03:42,753] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-01-28 10:03:42,728] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-01-28 10:03:42,928] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-01-28 10:03:43,076] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-01-28 10:03:43,319] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-01-28 10:03:43,390] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-01-28 10:03:43,460] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-01-28 10:03:43,511] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-01-28 10:03:43,543] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-01-28 10:03:43,531] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-01-28 10:03:43,604] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-01-28 10:03:43,781] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-01-28 10:03:44,366] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-01-28 10:03:44,409] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-01-28 10:03:44,462] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-01-28 10:03:44,488] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-01-28 10:03:44,515] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-01-28 10:03:44,594] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-01-28 10:03:44,693] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-01-28 10:03:44,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-01-28 10:03:44,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-01-28 10:03:44,785] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-01-28 10:03:44,861] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-01-28 10:03:44,926] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-01-28 10:03:45,042] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-01-28 10:03:45,092] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-01-28 10:03:45,207] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-01-28 10:03:45,378] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-01-28 10:03:45,468] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-01-28 10:03:45,503] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-01-28 10:03:45,544] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-01-28 10:03:45,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-01-28 10:03:46,014] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-01-28 10:03:46,137] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-01-28 10:03:46,221] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-01-28 10:03:46,239] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-01-28 10:03:46,422] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-01-28 10:03:46,570] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-01-28 10:03:46,616] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-01-28 10:03:46,667] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-01-28 10:03:46,716] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-01-28 10:03:46,895] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-01-28 10:03:46,908] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-01-28 10:03:46,988] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-01-28 10:03:47,029] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-01-28 10:03:47,304] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-01-28 10:03:47,354] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-01-28 10:03:47,380] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-01-28 10:03:47,373] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-01-28 10:03:47,483] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-01-28 10:03:47,607] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-01-28 10:03:47,658] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-01-28 10:03:47,694] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-01-28 10:03:47,686] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-01-28 10:03:47,727] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-01-28 10:03:47,778] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-01-28 10:03:47,800] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-01-28 10:03:47,951] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-01-28 10:03:48,063] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-01-28 10:03:48,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-01-28 10:03:48,144] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-01-28 10:03:48,476] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-01-28 10:03:48,510] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-01-28 10:03:48,515] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-01-28 10:03:48,600] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-01-28 10:03:48,603] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-01-28 10:03:48,447] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-01-28 10:03:48,744] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-01-28 10:03:48,860] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-01-28 10:03:48,933] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-01-28 10:03:49,039] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-01-28 10:03:49,049] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-01-28 10:03:49,101] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-01-28 10:03:49,083] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-01-28 10:03:49,301] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-01-28 10:03:49,389] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-01-28 10:03:49,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-01-28 10:03:49,476] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-01-28 10:03:49,567] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-01-28 10:03:49,626] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-01-28 10:03:49,656] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-01-28 10:03:49,707] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-01-28 10:03:49,729] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-01-28 10:03:49,735] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-01-28 10:03:49,739] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-01-28 10:03:49,823] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-01-28 10:03:49,852] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-01-28 10:03:49,879] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-01-28 10:03:49,881] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-01-28 10:03:50,037] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-01-28 10:03:50,179] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-01-28 10:03:50,393] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-01-28 10:03:50,513] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-01-28 10:03:50,809] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-01-28 10:03:51,216] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-01-28 10:03:51,280] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-01-28 10:03:51,394] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-01-28 10:03:51,630] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-01-28 10:03:51,963] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-01-28 10:03:52,880] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-01-28 10:03:53,040] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-01-28 10:03:54,424] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-01-28 10:03:54,875] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-01-28 10:03:54,942] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-01-28 10:03:54,922] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-01-28 10:03:55,861] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-01-28 10:03:56,054] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-01-28 10:03:56,360] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-01-28 10:03:56,481] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-01-28 10:03:56,507] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-01-28 10:03:56,597] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-01-28 10:03:58,119] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-01-28 10:03:58,140] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16000/zero_pp_rank_0_mp_rank_74_optim_states.pt successfully saved checkpoint at iteration 16000 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 46677.16 iteration 16001/ 292968 | consumed samples: 32770048 | consumed tokens: 16416194560 | elapsed time per iteration (ms): 169160.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.685888E+00 | loss scale: 65536.0 | grad norm: 50550.771 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 73.84 | iteration 16002/ 292968 | consumed samples: 32772096 | consumed tokens: 16418127872 | elapsed time per iteration (ms): 122524.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.672439E+00 | loss scale: 65536.0 | grad norm: 58395.910 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.94 | iteration 16003/ 292968 | consumed samples: 32774144 | consumed tokens: 16420061184 | elapsed time per iteration (ms): 122720.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.667068E+00 | loss scale: 65536.0 | grad norm: 65175.506 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.78 | iteration 16004/ 292968 | consumed samples: 32776192 | consumed tokens: 16421994496 | elapsed time per iteration (ms): 125843.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.703931E+00 | loss scale: 65536.0 | grad norm: 71491.826 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.25 | iteration 16005/ 292968 | consumed samples: 32778240 | consumed tokens: 16423927808 | elapsed time per iteration (ms): 123882.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.679746E+00 | loss scale: 65536.0 | grad norm: 58373.338 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.83 | iteration 16006/ 292968 | consumed samples: 32780288 | consumed tokens: 16425861120 | elapsed time per iteration (ms): 121913.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.675282E+00 | loss scale: 65536.0 | grad norm: 43508.221 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.45 | iteration 16007/ 292968 | consumed samples: 32782336 | consumed tokens: 16427794432 | elapsed time per iteration (ms): 123123.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.672615E+00 | loss scale: 65536.0 | grad norm: 37749.283 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.45 | iteration 16008/ 292968 | consumed samples: 32784384 | consumed tokens: 16429727744 | elapsed time per iteration (ms): 122716.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.655367E+00 | loss scale: 65536.0 | grad norm: 47579.398 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.78 | iteration 16009/ 292968 | consumed samples: 32786432 | consumed tokens: 16431661056 | elapsed time per iteration (ms): 123447.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.678422E+00 | loss scale: 65536.0 | grad norm: 57459.995 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.18 | iteration 16010/ 292968 | consumed samples: 32788480 | consumed tokens: 16433594368 | elapsed time per iteration (ms): 122445.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.669987E+00 | loss scale: 65536.0 | grad norm: 60118.870 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.01 | iteration 16011/ 292968 | consumed samples: 32790528 | consumed tokens: 16435527680 | elapsed time per iteration (ms): 124036.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.694063E+00 | loss scale: 65536.0 | grad norm: 48765.468 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.70 | iteration 16012/ 292968 | consumed samples: 32792576 | consumed tokens: 16437460992 | elapsed time per iteration (ms): 127488.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.691747E+00 | loss scale: 65536.0 | grad norm: 51627.046 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.97 | iteration 16013/ 292968 | consumed samples: 32794624 | consumed tokens: 16439394304 | elapsed time per iteration (ms): 122877.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.659035E+00 | loss scale: 65536.0 | grad norm: 62206.056 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.65 | iteration 16014/ 292968 | consumed samples: 32796672 | consumed tokens: 16441327616 | elapsed time per iteration (ms): 123161.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.686374E+00 | loss scale: 65536.0 | grad norm: 59175.406 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.42 | iteration 16015/ 292968 | consumed samples: 32798720 | consumed tokens: 16443260928 | elapsed time per iteration (ms): 123345.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.665487E+00 | loss scale: 65536.0 | grad norm: 51692.226 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.27 | iteration 16016/ 292968 | consumed samples: 32800768 | consumed tokens: 16445194240 | elapsed time per iteration (ms): 122663.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.687525E+00 | loss scale: 65536.0 | grad norm: 52208.461 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.83 | iteration 16017/ 292968 | consumed samples: 32802816 | consumed tokens: 16447127552 | elapsed time per iteration (ms): 124054.7 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.656977E+00 | loss scale: 65536.0 | grad norm: 60689.776 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.69 | iteration 16018/ 292968 | consumed samples: 32804864 | consumed tokens: 16449060864 | elapsed time per iteration (ms): 123165.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.714220E+00 | loss scale: 65536.0 | grad norm: 52351.622 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.41 | iteration 16019/ 292968 | consumed samples: 32806912 | consumed tokens: 16450994176 | elapsed time per iteration (ms): 123497.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.658452E+00 | loss scale: 65536.0 | grad norm: 54509.458 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.14 | iteration 16020/ 292968 | consumed samples: 32808960 | consumed tokens: 16452927488 | elapsed time per iteration (ms): 122795.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.682954E+00 | loss scale: 65536.0 | grad norm: 62859.831 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.72 | iteration 16021/ 292968 | consumed samples: 32811008 | consumed tokens: 16454860800 | elapsed time per iteration (ms): 122462.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.686487E+00 | loss scale: 65536.0 | grad norm: 52093.057 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.00 | iteration 16022/ 292968 | consumed samples: 32813056 | consumed tokens: 16456794112 | elapsed time per iteration (ms): 122953.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.689889E+00 | loss scale: 65536.0 | grad norm: 40259.679 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.59 | iteration 16023/ 292968 | consumed samples: 32815104 | consumed tokens: 16458727424 | elapsed time per iteration (ms): 121550.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.662580E+00 | loss scale: 65536.0 | grad norm: 48256.285 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.76 | iteration 16024/ 292968 | consumed samples: 32817152 | consumed tokens: 16460660736 | elapsed time per iteration (ms): 121930.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.656090E+00 | loss scale: 65536.0 | grad norm: 44588.438 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.44 | iteration 16025/ 292968 | consumed samples: 32819200 | consumed tokens: 16462594048 | elapsed time per iteration (ms): 122835.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.695139E+00 | loss scale: 65536.0 | grad norm: 38677.434 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.68 | iteration 16026/ 292968 | consumed samples: 32821248 | consumed tokens: 16464527360 | elapsed time per iteration (ms): 123259.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.692554E+00 | loss scale: 65536.0 | grad norm: 40425.528 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.34 | iteration 16027/ 292968 | consumed samples: 32823296 | consumed tokens: 16466460672 | elapsed time per iteration (ms): 122814.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.664741E+00 | loss scale: 65536.0 | grad norm: 41026.574 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.70 | iteration 16028/ 292968 | consumed samples: 32825344 | consumed tokens: 16468393984 | elapsed time per iteration (ms): 127296.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.697254E+00 | loss scale: 65536.0 | grad norm: 32734.058 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.12 | iteration 16029/ 292968 | consumed samples: 32827392 | consumed tokens: 16470327296 | elapsed time per iteration (ms): 126086.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.671077E+00 | loss scale: 65536.0 | grad norm: 41491.110 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.06 | iteration 16030/ 292968 | consumed samples: 32829440 | consumed tokens: 16472260608 | elapsed time per iteration (ms): 130507.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.669100E+00 | loss scale: 65536.0 | grad norm: 54913.266 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.71 | iteration 16031/ 292968 | consumed samples: 32831488 | consumed tokens: 16474193920 | elapsed time per iteration (ms): 125607.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.684384E+00 | loss scale: 65536.0 | grad norm: 61050.285 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.44 | iteration 16032/ 292968 | consumed samples: 32833536 | consumed tokens: 16476127232 | elapsed time per iteration (ms): 125245.7 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.689239E+00 | loss scale: 65536.0 | grad norm: 64881.466 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.73 | iteration 16033/ 292968 | consumed samples: 32835584 | consumed tokens: 16478060544 | elapsed time per iteration (ms): 123970.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.673563E+00 | loss scale: 65536.0 | grad norm: 62491.007 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.75 | iteration 16034/ 292968 | consumed samples: 32837632 | consumed tokens: 16479993856 | elapsed time per iteration (ms): 124188.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.668501E+00 | loss scale: 65536.0 | grad norm: 56525.872 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.58 | iteration 16035/ 292968 | consumed samples: 32839680 | consumed tokens: 16481927168 | elapsed time per iteration (ms): 126004.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.699851E+00 | loss scale: 65536.0 | grad norm: 62413.260 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.13 | iteration 16036/ 292968 | consumed samples: 32841728 | consumed tokens: 16483860480 | elapsed time per iteration (ms): 126960.7 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.677656E+00 | loss scale: 65536.0 | grad norm: 76026.379 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.38 | iteration 16037/ 292968 | consumed samples: 32843776 | consumed tokens: 16485793792 | elapsed time per iteration (ms): 127580.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.691460E+00 | loss scale: 65536.0 | grad norm: 61536.426 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.90 | iteration 16038/ 292968 | consumed samples: 32845824 | consumed tokens: 16487727104 | elapsed time per iteration (ms): 130255.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.684567E+00 | loss scale: 65536.0 | grad norm: 53949.044 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.89 | iteration 16039/ 292968 | consumed samples: 32847872 | consumed tokens: 16489660416 | elapsed time per iteration (ms): 132074.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.686293E+00 | loss scale: 65536.0 | grad norm: 56672.927 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.57 | iteration 16040/ 292968 | consumed samples: 32849920 | consumed tokens: 16491593728 | elapsed time per iteration (ms): 133646.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.668138E+00 | loss scale: 65536.0 | grad norm: 62419.757 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.46 | srun: Job step aborted: Waiting up to 62 seconds for job step to finish. Killing subprocess 255099 Killing subprocess 255100 Killing subprocess 255101 Killing subprocess 273934 Killing subprocess 296037 Killing subprocess 273935 Killing subprocess 296038 Killing subprocess 270550 Killing subprocess 259223 Killing subprocess 273936 Killing subprocess 296039 Killing subprocess 256580 Killing subprocess 247475 Killing subprocess 296041 Killing subprocess 259224 Killing subprocess 247476 Killing subprocess 270551 Killing subprocess 258445 Killing subprocess 259225 Killing subprocess 256581 Killing subprocess 247477 Killing subprocess 258446 Killing subprocess 256395 Killing subprocess 259227 Killing subprocess 258447 Killing subprocess 256582 Killing subprocess 273938 Killing subprocess 273940 Killing subprocess 273943 Killing subprocess 273945 Killing subprocess 273947 Killing subprocess 256584 Killing subprocess 259229 Killing subprocess 256396 Killing subprocess 256586 Killing subprocess 323074 Killing subprocess 256397 Killing subprocess 255103 Killing subprocess 259230 Killing subprocess 255105 Killing subprocess 255107 Killing subprocess 255109 Killing subprocess 255111 Killing subprocess 296043 Main process received SIGTERM, exiting Killing subprocess 296046 Killing subprocess 296048 Killing subprocess 296050 Main process received SIGTERM, exiting Killing subprocess 256589 Killing subprocess 259233 Killing subprocess 270552 Killing subprocess 261366 Killing subprocess 270554 Killing subprocess 323075 Killing subprocess 270556 Killing subprocess 255124 Killing subprocess 270558 Killing subprocess 270561 Killing subprocess 256591 Killing subprocess 256399 Killing subprocess 256593 Killing subprocess 256401 Killing subprocess 323076 Killing subprocess 259236 Killing subprocess 260005 Killing subprocess 261367 Killing subprocess 256403 Main process received SIGTERM, exiting Killing subprocess 255125 Main process received SIGTERM, exiting Killing subprocess 261368 Killing subprocess 256406 Killing subprocess 260006 Killing subprocess 255126 Killing subprocess 256408 Killing subprocess 255128 Main process received SIGTERM, exiting Killing subprocess 260007 Killing subprocess 241961 Killing subprocess 255130 Killing subprocess 247479 Killing subprocess 247481 Killing subprocess 247485 Killing subprocess 247487 Killing subprocess 247489 Main process received SIGTERM, exiting Killing subprocess 255133 Killing subprocess 241962 Killing subprocess 255135 Killing subprocess 258449 Killing subprocess 258451 Killing subprocess 258453 Killing subprocess 258456 Killing subprocess 258458 Main process received SIGTERM, exiting Killing subprocess 244019 Killing subprocess 255137 Killing subprocess 241963 Main process received SIGTERM, exiting Killing subprocess 261369 Killing subprocess 261372 Killing subprocess 261375 Killing subprocess 261377 Killing subprocess 261378 Main process received SIGTERM, exiting Killing subprocess 244020 Killing subprocess 244021 Killing subprocess 260009 Killing subprocess 260011 Killing subprocess 260014 Killing subprocess 260016 Killing subprocess 260018 Main process received SIGTERM, exiting Killing subprocess 323078 Killing subprocess 323080 Killing subprocess 241965 Killing subprocess 323082 Killing subprocess 241967 Killing subprocess 323085 Killing subprocess 241970 Killing subprocess 323087 Killing subprocess 241972 Main process received SIGTERM, exiting Killing subprocess 241974 Killing subprocess 270563 Main process received SIGTERM, exiting Killing subprocess 244023 Killing subprocess 244025 Killing subprocess 244026 Killing subprocess 244029 Killing subprocess 244031 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting slurmstepd: error: *** STEP 1557042.0 ON jean-zay-iam31 CANCELLED AT 2022-01-28T11:28:48 *** Killing subprocess 264788 Killing subprocess 264789 Killing subprocess 264790 Killing subprocess 264792 Killing subprocess 264794 Killing subprocess 264797 Killing subprocess 264799 Killing subprocess 264801 Main process received SIGTERM, exiting ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1650590.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] > setting tensorboard ...  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+24fe7002, 24fe7002, elastic-ckpt-refresh deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-02 09:29:41,211] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.173 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_upper_triang_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -o scaled_upper_triang_masked_softmax_cuda.cuda.o [2/2] c++ scaled_upper_triang_masked_softmax.o scaled_upper_triang_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_upper_triang_masked_softmax_cuda.so Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -o scaled_masked_softmax_cuda.cuda.o /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced [2/2] c++ scaled_masked_softmax.o scaled_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_masked_softmax_cuda.so Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output layer_norm_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -maxrregcount=50 -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -o layer_norm_cuda_kernel.cuda.o [2/2] c++ layer_norm_cuda.o layer_norm_cuda_kernel.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o fused_mix_prec_layer_norm_cuda.so Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 79.887 seconds time to initialize megatron (seconds): 69.272 [after megatron is initialized] datetime: 2022-02-02 09:31:01 building GPT model ... [2022-02-02 09:31:01,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,273] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,274] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,275] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-02 09:31:01,303] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-02 09:31:01,303] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-02 09:31:01,303] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.5 GB, percent = 7.8% [2022-02-02 09:31:01,304] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-02 09:31:02,983] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-02 09:31:03,660] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-02 09:31:03,660] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-02 09:31:03,660] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.84 GB, percent = 7.9% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-02 09:31:03,774] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+24fe7002, git-hash=24fe7002, git-branch=elastic-ckpt-refresh Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] [2022-02-02 09:31:04,583] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-02 09:31:04,583] [INFO] [engine.py:1093:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-02 09:31:04,584] [INFO] [engine.py:1099:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-02 09:31:04,584] [INFO] [engine.py:1115:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-02 09:31:04,584] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-02 09:31:04,584] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-02 09:31:04,584] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-02 09:31:04,584] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-02 09:31:04,584] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-02 09:31:04,584] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] [2022-02-02 09:31:09,408] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-02 09:31:09,409] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-02 09:31:09,409] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.73 GB, percent = 7.9% [2022-02-02 09:31:09,488] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-02 09:31:09,488] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-02 09:31:09,489] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.73 GB, percent = 7.9% [2022-02-02 09:31:09,489] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-02 09:31:09,508] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-02 09:31:09,508] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-02 09:31:09,509] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 39.73 GB, percent = 7.9% [2022-02-02 09:31:09,509] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-02 09:31:09,509] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-02 09:31:09,509] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-02 09:31:09,509] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-02 09:31:09,509] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-02 09:31:09,509] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-02 09:31:09,510] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-02 09:31:09,511] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-02 09:31:09,511] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-02 09:31:11,835] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-02 09:31:35,690] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-02 09:31:35,862] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-02 09:31:37,232] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-02 09:31:37,357] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-02 09:31:38,375] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-02 09:31:38,519] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-02 09:31:38,541] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-02 09:31:38,916] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-02 09:31:39,582] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-02 09:31:39,764] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-02 09:31:39,817] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-02 09:31:39,842] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-02 09:31:39,873] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-02 09:31:39,917] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-02 09:31:39,936] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-02 09:31:39,993] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-02 09:31:40,435] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-02 09:31:40,469] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-02 09:31:40,497] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-02 09:31:40,536] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-02 09:31:40,621] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-02 09:31:40,720] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-02 09:31:40,733] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-02 09:31:40,770] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-02 09:31:41,044] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-02 09:31:41,102] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-02 09:31:41,107] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-02-02 09:31:41,110] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-02 09:31:41,228] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-02 09:31:41,275] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-02 09:31:41,321] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-02 09:31:41,336] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-02 09:31:41,351] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-02 09:31:41,362] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-02 09:31:41,439] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-02 09:31:41,698] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-02 09:31:41,733] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-02 09:31:41,808] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-02 09:31:41,954] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-02 09:31:41,981] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-02 09:31:42,015] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-02 09:31:42,050] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-02 09:31:42,074] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-02 09:31:42,178] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-02 09:31:42,205] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-02 09:31:42,209] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-02 09:31:42,280] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-02 09:31:42,338] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-02 09:31:42,367] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-02 09:31:42,397] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-02 09:31:42,564] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-02 09:31:42,578] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-02 09:31:42,604] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-02 09:31:42,616] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-02 09:31:42,713] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-02 09:31:42,806] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-02 09:31:42,895] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-02 09:31:42,910] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-02 09:31:43,027] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-02 09:31:43,075] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-02 09:31:43,099] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-02 09:31:43,117] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-02 09:31:43,148] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-02 09:31:43,170] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-02 09:31:43,186] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-02 09:31:43,187] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-02 09:31:43,213] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-02 09:31:43,214] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-02 09:31:43,218] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-02-02 09:31:43,320] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-02 09:31:43,331] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-02 09:31:43,351] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-02 09:31:43,392] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-02 09:31:43,395] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-02 09:31:43,464] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-02 09:31:43,528] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-02 09:31:43,578] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-02 09:31:43,653] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-02 09:31:43,669] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-02 09:31:43,681] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-02 09:31:43,682] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-02 09:31:43,699] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-02 09:31:43,895] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-02 09:31:43,973] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-02 09:31:43,987] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-02 09:31:44,006] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-02 09:31:44,017] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-02 09:31:44,033] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-02 09:31:44,146] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-02 09:31:44,158] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-02 09:31:44,159] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-02 09:31:44,195] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-02 09:31:44,225] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-02 09:31:44,253] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-02 09:31:44,298] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-02 09:31:44,323] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-02 09:31:44,351] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-02 09:31:44,383] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-02 09:31:44,395] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-02 09:31:44,490] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-02 09:31:44,498] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-02 09:31:44,546] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-02 09:31:44,564] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-02 09:31:44,573] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-02 09:31:44,614] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-02 09:31:44,655] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-02 09:31:44,762] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-02 09:31:44,779] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-02 09:31:44,833] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-02 09:31:44,838] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-02 09:31:44,872] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-02 09:31:44,985] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-02 09:31:44,993] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-02 09:31:44,993] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-02 09:31:45,041] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-02 09:31:45,071] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-02 09:31:45,094] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-02 09:31:45,146] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-02 09:31:45,147] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-02 09:31:45,153] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-02 09:31:45,283] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-02 09:31:45,344] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-02 09:31:45,414] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-02 09:31:45,436] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-02 09:31:45,480] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-02 09:31:45,496] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-02 09:31:45,509] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-02 09:31:45,530] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-02 09:31:45,558] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-02 09:31:45,562] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-02 09:31:45,592] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-02 09:31:45,596] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-02 09:31:45,623] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-02 09:31:45,686] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-02 09:31:45,793] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-02 09:31:45,810] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-02 09:31:45,848] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-02 09:31:45,941] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-02 09:31:45,950] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-02 09:31:45,953] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-02 09:31:45,975] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-02 09:31:46,021] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-02 09:31:46,032] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-02 09:31:46,036] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-02 09:31:46,044] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-02 09:31:46,130] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-02 09:31:46,142] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-02 09:31:46,155] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-02 09:31:46,169] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-02 09:31:46,183] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-02 09:31:46,237] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-02 09:31:46,464] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-02 09:31:46,468] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-02 09:31:46,488] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-02 09:31:46,507] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-02 09:31:46,522] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-02 09:31:46,560] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-02 09:31:46,608] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-02 09:31:46,643] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-02 09:31:46,676] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-02 09:31:46,703] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-02 09:31:46,736] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-02 09:31:46,738] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-02 09:31:46,764] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-02 09:31:46,810] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-02 09:31:46,942] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-02 09:31:46,955] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-02-02 09:31:46,961] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-02 09:31:46,971] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-02 09:31:47,018] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-02 09:31:47,036] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-02 09:31:47,037] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-02 09:31:47,072] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-02 09:31:47,073] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-02 09:31:47,085] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-02 09:31:47,095] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-02 09:31:47,112] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-02 09:31:47,193] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-02 09:31:47,213] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-02 09:31:47,215] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-02 09:31:47,255] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-02 09:31:47,267] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-02 09:31:47,269] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-02 09:31:47,283] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-02 09:31:47,338] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-02 09:31:47,371] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-02 09:31:47,397] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-02 09:31:47,454] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-02 09:31:47,470] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-02 09:31:47,492] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-02 09:31:47,603] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-02 09:31:47,605] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-02 09:31:47,617] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-02 09:31:47,650] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-02 09:31:47,677] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-02 09:31:47,679] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-02 09:31:47,693] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-02 09:31:47,723] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-02 09:31:47,753] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-02 09:31:47,753] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-02 09:31:47,838] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-02 09:31:47,868] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-02 09:31:47,878] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-02 09:31:47,890] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-02 09:31:47,917] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-02 09:31:47,927] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-02 09:31:47,936] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-02 09:31:47,992] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-02 09:31:47,993] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-02 09:31:48,040] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-02 09:31:48,058] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-02 09:31:48,083] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-02 09:31:48,220] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-02 09:31:48,235] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-02 09:31:48,267] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-02 09:31:48,298] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-02 09:31:48,314] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-02-02 09:31:48,321] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-02 09:31:48,398] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-02 09:31:48,423] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-02 09:31:48,423] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-02 09:31:48,453] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-02 09:31:48,552] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-02 09:31:48,571] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-02 09:31:48,646] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-02 09:31:48,700] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-02 09:31:48,706] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-02-02 09:31:48,765] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-02 09:31:48,777] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-02 09:31:48,838] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-02 09:31:48,931] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-02 09:31:48,964] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-02 09:31:48,969] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-02 09:31:49,079] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-02 09:31:49,095] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-02 09:31:49,160] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-02 09:31:49,209] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-02 09:31:49,269] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-02 09:31:49,338] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-02 09:31:49,388] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-02 09:31:49,445] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-02 09:31:49,449] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-02 09:31:49,508] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-02 09:31:49,550] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-02 09:31:49,571] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-02 09:31:49,603] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 [2022-02-02 09:31:49,646] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-02 09:31:49,657] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-02 09:31:49,709] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-02 09:31:49,711] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-02 09:31:49,724] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-02 09:31:49,995] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-02 09:31:50,037] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-02 09:31:50,111] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-02-02 09:31:50,192] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-02 09:31:50,735] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 16000 time (ms) | load-checkpoint: 37611.06 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-02 09:31:50 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.424997 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.168 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.186 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.097 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-02 09:31:59 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 49480.53 | train/valid/test-data-iterators-setup: 7640.95 [001-001] 103.3651B / 103.3651B [003-001] 103.3651B / 103.3651B [003-000] 125.2243B / 103.3681B[002-001] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B[001-009] 103.3651B / 103.3651B[003-008] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B[003-016] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B[003-017] 103.3651B / 103.3651B [003-010] 103.3651B / 103.3651B[001-010] 103.3651B / 103.3651B[003-011] 103.3651B / 103.3651B[002-011] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B[003-028] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B[003-024] 103.3651B / 103.3651B [002-013] 103.3651B / 103.3651B [002-012] 103.3651B / 103.3651B[003-012] 103.3651B / 103.3651B[001-012] 103.3651B / 103.3651B [002-002] 103.3651B / 103.3651B [003-002] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B[003-021] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B [002-015] 103.3651B / 103.3651B[003-014] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B[002-000] 125.2243B / 103.3681B [003-006] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B[002-018] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [002-009] 103.3651B / 103.3651B[002-008] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B [002-029] 103.3651B / 103.3651B[001-029] 103.3651B / 103.3651B [003-030] 103.3651B / 103.3651B [003-003] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B[001-003] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B[001-015] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B [001-022] 103.3651B / 103.3651B[003-023] 103.3651B / 103.3651B [001-024] 103.3651B / 103.3651B[003-025] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B[002-006] 103.3651B / 103.3651B[002-007] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [001-004] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B [002-005] 103.3651B / 103.3651B [001-013] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B[001-018] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [003-029] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [001-002] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B [000-017] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B [002-024] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [003-027] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B[000-008] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B[002-031] 125.2273B / 103.3710B [000-016] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B[002-026] 103.3651B / 103.3651B [000-007] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B[000-013] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-026] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B[000-004] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-023] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-02-02 09:31:59 [2022-02-02 09:31:59,287] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-02 09:31:59,287] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-02 09:31:59,287] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-02 09:31:59,287] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-02 09:31:59,287] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False Killing subprocess 23473 Killing subprocess 23474 Killing subprocess 23475 Killing subprocess 22990 Killing subprocess 23476 Killing subprocess 23477 Killing subprocess 22991 Killing subprocess 22992 Killing subprocess 23478 Killing subprocess 23480 Killing subprocess 23482 Main process received SIGTERM, exiting Killing subprocess 22757 Killing subprocess 22758 Killing subprocess 22759 Killing subprocess 22994 Killing subprocess 22995 Killing subprocess 22996 Killing subprocess 22998 Killing subprocess 23000 Main process received SIGTERM, exiting Killing subprocess 22425 Killing subprocess 22426 Killing subprocess 22427 Killing subprocess 22428 Killing subprocess 22760 Killing subprocess 22429 Killing subprocess 22761 Killing subprocess 22430 Killing subprocess 22762 Killing subprocess 22432 Killing subprocess 22764 Killing subprocess 22766 Main process received SIGTERM, exiting Killing subprocess 22433 Main process received SIGTERM, exiting Killing subprocess 26367 Killing subprocess 26368 Killing subprocess 26369 Killing subprocess 26370 Killing subprocess 22770 Killing subprocess 22771 Killing subprocess 22772 Killing subprocess 22773 Killing subprocess 22774 Killing subprocess 22775 Killing subprocess 22777 Killing subprocess 22779 Main process received SIGTERM, exiting Killing subprocess 26371 Killing subprocess 26372 Killing subprocess 26374 Killing subprocess 26377 Main process received SIGTERM, exiting Killing subprocess 23081 Killing subprocess 22027 Killing subprocess 23082 Killing subprocess 22028 Killing subprocess 22029 Killing subprocess 22030 Killing subprocess 23083 Killing subprocess 22389 Killing subprocess 22393 Killing subprocess 22390 Killing subprocess 22182 Killing subprocess 22394 Killing subprocess 22844 Killing subprocess 22391 Killing subprocess 23084 Killing subprocess 22183 Killing subprocess 23085 Killing subprocess 23086 Killing subprocess 22789 Killing subprocess 22395 Killing subprocess 23088 Killing subprocess 23090 Killing subprocess 22845 Killing subprocess 22184 Killing subprocess 22846 Killing subprocess 22790 Killing subprocess 22185 Killing subprocess 22791 Killing subprocess 22031 Killing subprocess 22392 Killing subprocess 22032 Killing subprocess 22393 Killing subprocess 22034 Killing subprocess 22394 Killing subprocess 22035 Killing subprocess 22396 Main process received SIGTERM, exiting Killing subprocess 22399 Killing subprocess 22916 Killing subprocess 23143 Killing subprocess 22847 Killing subprocess 22848 Killing subprocess 22849 Killing subprocess 22917 Killing subprocess 22851 Killing subprocess 22852 Killing subprocess 22186 Killing subprocess 22187 Killing subprocess 22188 Killing subprocess 22191 Main process received SIGTERM, exiting Killing subprocess 23144 Killing subprocess 22918 Killing subprocess 23145 Killing subprocess 23146 Killing subprocess 22396 Killing subprocess 22397 Killing subprocess 22399 Killing subprocess 22401 Killing subprocess 22403 Main process received SIGTERM, exiting Killing subprocess 22919 Killing subprocess 22920 Killing subprocess 22921 Main process received SIGTERM, exiting Killing subprocess 22792 Killing subprocess 22793 Killing subprocess 22794 Killing subprocess 22795 Killing subprocess 22798 Main process received SIGTERM, exiting slurmstepd: error: *** STEP 1650590.0 ON jean-zay-iam01 CANCELLED AT 2022-02-02T09:34:08 *** Killing subprocess 23147 Killing subprocess 23149 Killing subprocess 23150 Killing subprocess 23152 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 24082 Main process received SIGTERM, exiting Killing subprocess 24083 Killing subprocess 24084 Killing subprocess 24085 Killing subprocess 24086 Killing subprocess 24087 Killing subprocess 24089 Killing subprocess 24091 Killing subprocess 22923 Killing subprocess 22925 Main process received SIGTERM, exiting Main process received SIGTERM, exiting ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** > setting tensorboard ... using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1650591.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+24fe7002, 24fe7002, elastic-ckpt-refresh deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-02 09:34:50,607] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.131 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1737098.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] > setting tensorboard ...  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+24fe7002, 24fe7002, elastic-ckpt-refresh deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-04 20:35:18,168] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.207 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_upper_triang_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -o scaled_upper_triang_masked_softmax_cuda.cuda.o [2/2] c++ scaled_upper_triang_masked_softmax.o scaled_upper_triang_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_upper_triang_masked_softmax_cuda.so Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -o scaled_masked_softmax_cuda.cuda.o /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced [2/2] c++ scaled_masked_softmax.o scaled_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_masked_softmax_cuda.so Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output layer_norm_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -maxrregcount=50 -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -o layer_norm_cuda_kernel.cuda.o [2/2] c++ layer_norm_cuda.o layer_norm_cuda_kernel.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o fused_mix_prec_layer_norm_cuda.so Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 79.536 seconds time to initialize megatron (seconds): 69.913 [after megatron is initialized] datetime: 2022-02-04 20:36:37 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 building GPT model ... [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,914] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,915] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,916] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,916] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,916] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,917] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,917] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,917] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,917] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,917] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,921] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,922] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,924] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,925] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,925] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,927] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,928] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,935] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 [2022-02-04 20:36:37,944] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-04 20:36:37,944] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-04 20:36:37,945] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 42.88 GB, percent = 8.5% [2022-02-04 20:36:37,945] [INFO] [partition_parameters.py:511:__init__] _all_gather_base API is not available in torch 1.8.1 SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-04 20:36:39,623] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-04 20:36:40,299] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-04 20:36:40,300] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-04 20:36:40,300] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 43.23 GB, percent = 8.6% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-04 20:36:40,352] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+24fe7002, git-hash=24fe7002, git-branch=elastic-ckpt-refresh [2022-02-04 20:36:40,858] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-04 20:36:40,858] [INFO] [engine.py:1093:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-04 20:36:40,858] [INFO] [engine.py:1099:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-04 20:36:40,858] [INFO] [engine.py:1115:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-04 20:36:40,858] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-04 20:36:40,858] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-04 20:36:40,858] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-04 20:36:40,858] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-04 20:36:40,858] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-04 20:36:40,858] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] [2022-02-04 20:36:44,761] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-04 20:36:44,761] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-04 20:36:44,762] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 43.1 GB, percent = 8.6% [2022-02-04 20:36:44,840] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-04 20:36:44,841] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-04 20:36:44,841] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 43.1 GB, percent = 8.6% [2022-02-04 20:36:44,841] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-04 20:36:44,860] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-04 20:36:44,860] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-04 20:36:44,860] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 43.1 GB, percent = 8.6% [2022-02-04 20:36:44,860] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-04 20:36:44,860] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-04 20:36:44,860] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-04 20:36:44,860] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-04 20:36:44,861] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-04 20:36:44,861] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-04 20:36:44,862] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-04 20:36:44,862] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-04 20:36:44,862] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,164] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-04 20:36:47,165] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-04 20:37:13,733] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-04 20:37:13,872] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-04 20:37:13,951] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-04 20:37:14,374] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-04 20:37:14,563] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-04 20:37:15,141] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-04 20:37:15,741] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-04 20:37:15,792] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-04 20:37:15,806] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-04 20:37:15,871] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-04 20:37:15,885] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-04 20:37:15,975] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-04 20:37:16,123] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-04 20:37:16,151] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-04 20:37:16,377] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-04 20:37:16,435] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-04 20:37:16,544] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-04 20:37:16,579] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-04 20:37:16,611] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-04 20:37:16,812] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-04 20:37:16,928] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-04 20:37:17,249] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-04 20:37:17,348] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-04 20:37:17,601] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-04 20:37:17,691] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-04 20:37:17,783] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-04 20:37:17,844] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-04 20:37:17,877] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-04 20:37:17,923] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-04 20:37:18,054] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-04 20:37:18,112] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-04 20:37:18,171] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-04 20:37:18,260] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-04 20:37:18,393] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-04 20:37:18,527] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-04 20:37:18,788] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-04 20:37:18,810] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-04 20:37:18,923] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-04 20:37:19,007] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-04 20:37:19,008] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-04 20:37:19,100] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-04 20:37:19,356] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-04 20:37:19,364] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-04 20:37:19,497] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-04 20:37:19,548] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-04 20:37:19,613] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-04 20:37:19,622] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-02-04 20:37:19,651] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-04 20:37:19,680] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-04 20:37:19,712] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-04 20:37:19,716] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-04 20:37:19,760] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-04 20:37:19,787] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-04 20:37:19,966] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-04 20:37:19,983] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-04 20:37:20,031] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-04 20:37:20,337] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-04 20:37:20,429] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-04 20:37:20,489] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-04 20:37:20,502] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-04 20:37:20,788] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-04 20:37:20,870] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-04 20:37:20,974] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-04 20:37:20,980] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-04 20:37:21,019] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-04 20:37:21,103] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-04 20:37:21,105] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-04 20:37:21,113] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-04 20:37:21,116] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-04 20:37:21,146] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-04 20:37:21,200] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-04 20:37:21,204] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-02-04 20:37:21,251] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-04 20:37:21,252] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-04 20:37:21,339] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-04 20:37:21,441] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-04 20:37:21,488] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-04 20:37:21,608] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-04 20:37:21,678] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-04 20:37:21,726] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-04 20:37:21,908] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-04 20:37:22,124] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-04 20:37:22,237] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-04 20:37:22,332] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-04 20:37:22,351] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-04 20:37:22,355] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-04 20:37:22,470] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-04 20:37:22,503] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-04 20:37:22,541] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-04 20:37:22,591] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-04 20:37:22,742] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-04 20:37:22,769] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-04 20:37:22,776] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-04 20:37:22,790] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-04 20:37:22,844] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-04 20:37:22,844] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-04 20:37:22,961] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-02-04 20:37:23,050] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-04 20:37:23,078] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-04 20:37:23,165] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-04 20:37:23,309] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-04 20:37:23,590] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-04 20:37:23,668] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-04 20:37:23,682] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-04 20:37:23,692] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-04 20:37:23,738] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-04 20:37:23,825] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-04 20:37:23,840] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-04 20:37:23,879] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-04 20:37:23,909] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-04 20:37:24,006] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-04 20:37:24,145] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-04 20:37:24,145] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-04 20:37:24,164] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-04 20:37:24,360] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-04 20:37:24,446] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-04 20:37:24,472] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-04 20:37:24,557] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-04 20:37:24,610] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-04 20:37:24,634] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-04 20:37:24,646] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-04 20:37:24,658] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-04 20:37:24,665] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-04 20:37:24,858] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-04 20:37:24,892] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-04 20:37:24,896] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-04 20:37:24,942] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-04 20:37:24,944] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-04 20:37:25,032] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-04 20:37:25,181] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-04 20:37:25,292] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-04 20:37:25,324] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-04 20:37:25,396] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-04 20:37:25,440] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-04 20:37:25,482] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-04 20:37:25,493] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-04 20:37:25,523] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-04 20:37:25,878] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-04 20:37:25,911] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-02-04 20:37:25,963] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-04 20:37:25,989] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-04 20:37:25,998] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-04 20:37:26,037] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-04 20:37:26,115] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-04 20:37:26,155] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-04 20:37:26,245] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-04 20:37:26,260] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-04 20:37:26,272] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-04 20:37:26,318] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-04 20:37:26,353] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-04 20:37:26,418] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-04 20:37:26,437] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-04 20:37:26,444] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-04 20:37:26,909] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-04 20:37:26,916] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-04 20:37:26,975] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-04 20:37:27,126] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-04 20:37:27,216] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-04 20:37:27,253] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-04 20:37:27,269] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-04 20:37:27,493] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-04 20:37:27,499] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-04 20:37:27,525] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-04 20:37:27,560] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-04 20:37:27,592] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-04 20:37:27,717] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-04 20:37:27,721] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-04 20:37:27,762] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-04 20:37:27,897] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-04 20:37:28,009] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-04 20:37:28,030] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-04 20:37:28,047] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-04 20:37:28,305] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-04 20:37:28,351] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-04 20:37:28,551] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-04 20:37:28,786] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-04 20:37:28,895] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-04 20:37:28,920] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-04 20:37:28,929] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-04 20:37:28,947] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-04 20:37:28,997] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-04 20:37:29,046] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-04 20:37:29,136] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-04 20:37:29,171] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-04 20:37:29,275] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-04 20:37:29,348] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-04 20:37:29,392] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-04 20:37:29,444] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-04 20:37:29,464] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-04 20:37:29,505] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-04 20:37:29,605] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-04 20:37:29,963] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-04 20:37:29,980] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-04 20:37:30,274] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-04 20:37:30,325] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-04 20:37:30,340] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-04 20:37:30,509] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-04 20:37:30,678] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-04 20:37:30,683] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-04 20:37:30,753] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 [2022-02-04 20:37:30,780] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-04 20:37:30,800] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-04 20:37:30,861] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-04 20:37:30,969] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-04 20:37:31,039] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-04 20:37:31,047] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-04 20:37:31,353] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-04 20:37:31,363] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-04 20:37:31,757] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-04 20:37:31,868] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-04 20:37:31,878] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-04 20:37:32,245] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-04 20:37:32,278] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-04 20:37:32,468] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-04 20:37:32,737] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-04 20:37:32,779] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-04 20:37:32,782] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-04 20:37:33,011] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-04 20:37:33,273] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-04 20:37:33,341] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-04 20:37:33,706] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-04 20:37:34,267] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-04 20:37:34,339] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-04 20:37:34,380] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-04 20:37:34,406] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-04 20:37:34,704] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-02-04 20:37:34,749] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-04 20:37:34,842] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-04 20:37:35,012] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-04 20:37:35,144] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-04 20:37:35,631] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-04 20:37:35,753] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-04 20:37:35,809] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-04 20:37:36,220] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-04 20:37:36,388] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-04 20:37:36,517] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-04 20:37:36,825] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-04 20:37:37,674] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-04 20:37:37,727] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-04 20:37:38,052] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-04 20:37:38,196] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-04 20:37:38,254] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-04 20:37:39,036] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-04 20:37:39,447] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-04 20:37:39,473] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-04 20:37:39,556] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-04 20:37:39,718] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-02-04 20:37:39,737] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-04 20:37:40,261] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-04 20:37:40,510] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-04 20:37:40,853] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-04 20:37:40,892] [INFO] [engine.py:2672:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-04 20:37:40,933] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-04 20:37:41,650] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-04 20:37:41,870] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-04 20:37:42,549] [INFO] [engine.py:2602:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 16000 time (ms) | load-checkpoint: 54095.84 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-04 20:37:42 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 5.238195 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.176 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.141 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.064 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-04 20:37:57 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 64628.36 | train/valid/test-data-iterators-setup: 12868.25 [003-030] 103.3651B / 103.3651B[001-030] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B [002-001] 103.3651B / 103.3651B [002-013] 103.3651B / 103.3651B[001-013] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B[002-031] 125.2273B / 103.3710B [003-023] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B[003-016] 103.3651B / 103.3651B[002-017] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [001-015] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B[001-006] 103.3651B / 103.3651B [002-007] 103.3651B / 103.3651B[001-007] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [003-010] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B[003-019] 103.3651B / 103.3651B[001-018] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B[002-004] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B[002-009] 103.3651B / 103.3651B[001-009] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B[001-024] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B [003-013] 103.3651B / 103.3651B[002-012] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [003-022] 103.3651B / 103.3651B[002-023] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [003-029] 103.3651B / 103.3651B[002-029] 103.3651B / 103.3651B [003-028] 103.3651B / 103.3651B [003-017] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B [002-015] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [002-002] 103.3651B / 103.3651B[002-003] 103.3651B / 103.3651B [002-024] 103.3651B / 103.3651B[001-025] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B [001-001] 103.3651B / 103.3651B [003-012] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B[001-022] 103.3651B / 103.3651B [001-029] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [003-006] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B [001-010] 103.3651B / 103.3651B[003-011] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [001-004] 103.3651B / 103.3651B [002-008] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B[003-027] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [003-002] 103.3651B / 103.3651B [001-003] 103.3651B / 103.3651B[003-003] 103.3651B / 103.3651B [003-024] 103.3651B / 103.3651B [003-001] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-017] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [000-007] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B[002-005] 103.3651B / 103.3651B [000-008] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B[000-024] 103.3651B / 103.3651B [003-000] 125.2243B / 103.3681B [000-013] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-022] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B [000-020] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-004] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-02-04 20:37:57 [2022-02-04 20:37:57,519] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-04 20:37:57,519] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-04 20:37:57,519] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-04 20:37:57,519] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-04 20:37:57,519] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 4] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 124] (after 16001 iterations) memory (MB) | allocated: 13250.3173828125 | max allocated: 20714.298828125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 0] (after 16001 iterations) memory (MB) | allocated: 13207.30712890625 | max allocated: 20670.92333984375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 8] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 24] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.66943359375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 20] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 1] (after 16001 iterations) memory (MB) | allocated: 13207.4208984375 | max allocated: 20671.037109375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 5] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.177734375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 9] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 125] (after 16001 iterations) memory (MB) | allocated: 13250.3173828125 | max allocated: 20713.978515625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 17] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 13] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 126] (after 16001 iterations) memory (MB) | allocated: 13249.9501953125 | max allocated: 20714.59033203125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 2] (after 16001 iterations) memory (MB) | allocated: 13207.4208984375 | max allocated: 20671.037109375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 18] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 6] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 11] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 3] (after 16001 iterations) memory (MB) | allocated: 13207.4208984375 | max allocated: 20671.037109375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 10] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 7] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 35] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.8017578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.5615234375 | reserved: 20072.0 | max reserved: 20072.0[Rank 104] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 16001 iterations) memory (MB) | allocated: 10797.18408203125 | max allocated: 16957.36572265625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 25] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 16001 iterations) memory (MB) | allocated: 10797.65087890625 | max allocated: 16957.83251953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0[Rank 49] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.8017578125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.5615234375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0[Rank 58] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 66] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 16001 iterations) memory (MB) | allocated: 10797.18408203125 | max allocated: 16957.36572265625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 16001 iterations) memory (MB) | allocated: 10797.18408203125 | max allocated: 16957.36572265625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 86] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.5615234375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 16001 iterations) memory (MB) | allocated: 10797.18408203125 | max allocated: 16957.896484375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0[Rank 94] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 16001 iterations) memory (MB) | allocated: 10797.18408203125 | max allocated: 16957.36572265625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 iteration 16001/ 292968 | consumed samples: 32770048 | consumed tokens: 16416194560 | elapsed time per iteration (ms): 266277.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.686289E+00 | loss scale: 65536.0 | grad norm: 50482.045 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.008 | TFLOPs: 46.91 | [Rank 111] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.5615234375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.5615234375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 127] (after 16001 iterations) memory (MB) | allocated: 13249.9501953125 | max allocated: 20713.931640625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 118] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 123] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 122] (after 16001 iterations) memory (MB) | allocated: 10796.84912109375 | max allocated: 16957.03076171875 | reserved: 20072.0 | max reserved: 20072.0 iteration 16002/ 292968 | consumed samples: 32772096 | consumed tokens: 16418127872 | elapsed time per iteration (ms): 135201.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.672548E+00 | loss scale: 65536.0 | grad norm: 57881.143 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.38 | iteration 16003/ 292968 | consumed samples: 32774144 | consumed tokens: 16420061184 | elapsed time per iteration (ms): 130129.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.667671E+00 | loss scale: 65536.0 | grad norm: 64296.804 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.99 | iteration 16004/ 292968 | consumed samples: 32776192 | consumed tokens: 16421994496 | elapsed time per iteration (ms): 130103.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.703609E+00 | loss scale: 65536.0 | grad norm: 72308.133 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.00 | iteration 16005/ 292968 | consumed samples: 32778240 | consumed tokens: 16423927808 | elapsed time per iteration (ms): 127595.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.680434E+00 | loss scale: 65536.0 | grad norm: 60224.024 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.89 | iteration 16006/ 292968 | consumed samples: 32780288 | consumed tokens: 16425861120 | elapsed time per iteration (ms): 124561.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.674761E+00 | loss scale: 65536.0 | grad norm: 45979.970 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.28 | iteration 16007/ 292968 | consumed samples: 32782336 | consumed tokens: 16427794432 | elapsed time per iteration (ms): 125022.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.672683E+00 | loss scale: 65536.0 | grad norm: 37847.507 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.91 | iteration 16008/ 292968 | consumed samples: 32784384 | consumed tokens: 16429727744 | elapsed time per iteration (ms): 127807.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.654385E+00 | loss scale: 65536.0 | grad norm: 45318.754 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.73 | iteration 16009/ 292968 | consumed samples: 32786432 | consumed tokens: 16431661056 | elapsed time per iteration (ms): 123686.7 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.678606E+00 | loss scale: 65536.0 | grad norm: 54746.382 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.99 | iteration 16010/ 292968 | consumed samples: 32788480 | consumed tokens: 16433594368 | elapsed time per iteration (ms): 127120.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.668453E+00 | loss scale: 65536.0 | grad norm: 57833.946 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.26 | iteration 16011/ 292968 | consumed samples: 32790528 | consumed tokens: 16435527680 | elapsed time per iteration (ms): 123835.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.695117E+00 | loss scale: 65536.0 | grad norm: 46302.152 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.86 | iteration 16012/ 292968 | consumed samples: 32792576 | consumed tokens: 16437460992 | elapsed time per iteration (ms): 130447.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.689630E+00 | loss scale: 65536.0 | grad norm: 50955.076 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.75 | iteration 16013/ 292968 | consumed samples: 32794624 | consumed tokens: 16439394304 | elapsed time per iteration (ms): 132529.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.663349E+00 | loss scale: 65536.0 | grad norm: 67540.276 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.25 | iteration 16014/ 292968 | consumed samples: 32796672 | consumed tokens: 16441327616 | elapsed time per iteration (ms): 131822.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.684647E+00 | loss scale: 65536.0 | grad norm: 57006.470 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 94.75 | iteration 16015/ 292968 | consumed samples: 32798720 | consumed tokens: 16443260928 | elapsed time per iteration (ms): 129576.7 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.664863E+00 | loss scale: 65536.0 | grad norm: 39598.650 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.39 | iteration 16016/ 292968 | consumed samples: 32800768 | consumed tokens: 16445194240 | elapsed time per iteration (ms): 129272.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.678806E+00 | loss scale: 65536.0 | grad norm: 36068.169 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.62 | iteration 16017/ 292968 | consumed samples: 32802816 | consumed tokens: 16447127552 | elapsed time per iteration (ms): 128984.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.653593E+00 | loss scale: 65536.0 | grad norm: 48523.579 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.84 | iteration 16018/ 292968 | consumed samples: 32804864 | consumed tokens: 16449060864 | elapsed time per iteration (ms): 129059.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.706792E+00 | loss scale: 65536.0 | grad norm: 45744.156 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.78 | iteration 16019/ 292968 | consumed samples: 32806912 | consumed tokens: 16450994176 | elapsed time per iteration (ms): 129722.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.665133E+00 | loss scale: 65536.0 | grad norm: 33420.751 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.29 | iteration 16020/ 292968 | consumed samples: 32808960 | consumed tokens: 16452927488 | elapsed time per iteration (ms): 133081.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.677978E+00 | loss scale: 65536.0 | grad norm: 42586.035 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.86 | iteration 16021/ 292968 | consumed samples: 32811008 | consumed tokens: 16454860800 | elapsed time per iteration (ms): 133108.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.689134E+00 | loss scale: 65536.0 | grad norm: 50709.886 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.84 | iteration 16022/ 292968 | consumed samples: 32813056 | consumed tokens: 16456794112 | elapsed time per iteration (ms): 134869.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.693937E+00 | loss scale: 65536.0 | grad norm: 57796.317 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.61 | iteration 16023/ 292968 | consumed samples: 32815104 | consumed tokens: 16458727424 | elapsed time per iteration (ms): 134380.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.666313E+00 | loss scale: 65536.0 | grad norm: 64671.553 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.95 | iteration 16024/ 292968 | consumed samples: 32817152 | consumed tokens: 16460660736 | elapsed time per iteration (ms): 133433.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.660163E+00 | loss scale: 65536.0 | grad norm: 66377.436 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.61 | iteration 16025/ 292968 | consumed samples: 32819200 | consumed tokens: 16462594048 | elapsed time per iteration (ms): 133299.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.697248E+00 | loss scale: 65536.0 | grad norm: 62750.713 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.70 | iteration 16026/ 292968 | consumed samples: 32821248 | consumed tokens: 16464527360 | elapsed time per iteration (ms): 140257.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.693047E+00 | loss scale: 65536.0 | grad norm: 71532.019 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 89.05 | iteration 16027/ 292968 | consumed samples: 32823296 | consumed tokens: 16466460672 | elapsed time per iteration (ms): 134451.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.661211E+00 | loss scale: 65536.0 | grad norm: 47728.886 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.90 | iteration 16028/ 292968 | consumed samples: 32825344 | consumed tokens: 16468393984 | elapsed time per iteration (ms): 131182.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.696548E+00 | loss scale: 65536.0 | grad norm: 41307.392 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.21 | iteration 16029/ 292968 | consumed samples: 32827392 | consumed tokens: 16470327296 | elapsed time per iteration (ms): 132368.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.670676E+00 | loss scale: 65536.0 | grad norm: 49767.490 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.36 | iteration 16030/ 292968 | consumed samples: 32829440 | consumed tokens: 16472260608 | elapsed time per iteration (ms): 128551.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.668139E+00 | loss scale: 65536.0 | grad norm: 46994.079 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.16 | iteration 16031/ 292968 | consumed samples: 32831488 | consumed tokens: 16474193920 | elapsed time per iteration (ms): 127564.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.685988E+00 | loss scale: 65536.0 | grad norm: 39339.131 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.92 | iteration 16032/ 292968 | consumed samples: 32833536 | consumed tokens: 16476127232 | elapsed time per iteration (ms): 127145.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.688291E+00 | loss scale: 65536.0 | grad norm: 41611.332 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.24 | iteration 16033/ 292968 | consumed samples: 32835584 | consumed tokens: 16478060544 | elapsed time per iteration (ms): 126691.5 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.677119E+00 | loss scale: 65536.0 | grad norm: 65586.683 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.59 | iteration 16034/ 292968 | consumed samples: 32837632 | consumed tokens: 16479993856 | elapsed time per iteration (ms): 126588.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.674102E+00 | loss scale: 65536.0 | grad norm: 96272.758 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.67 | iteration 16035/ 292968 | consumed samples: 32839680 | consumed tokens: 16481927168 | elapsed time per iteration (ms): 126281.3 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.699074E+00 | loss scale: 65536.0 | grad norm: 30657.648 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.91 | iteration 16036/ 292968 | consumed samples: 32841728 | consumed tokens: 16483860480 | elapsed time per iteration (ms): 124818.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.682159E+00 | loss scale: 65536.0 | grad norm: 73232.940 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.07 | iteration 16037/ 292968 | consumed samples: 32843776 | consumed tokens: 16485793792 | elapsed time per iteration (ms): 125428.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.695874E+00 | loss scale: 65536.0 | grad norm: 93228.280 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.58 | iteration 16038/ 292968 | consumed samples: 32845824 | consumed tokens: 16487727104 | elapsed time per iteration (ms): 122561.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.684866E+00 | loss scale: 65536.0 | grad norm: 41591.380 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.91 | iteration 16039/ 292968 | consumed samples: 32847872 | consumed tokens: 16489660416 | elapsed time per iteration (ms): 123416.4 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.692347E+00 | loss scale: 65536.0 | grad norm: 115843.015 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.21 | iteration 16040/ 292968 | consumed samples: 32849920 | consumed tokens: 16491593728 | elapsed time per iteration (ms): 123005.2 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.665739E+00 | loss scale: 65536.0 | grad norm: 43104.759 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.54 | iteration 16041/ 292968 | consumed samples: 32851968 | consumed tokens: 16493527040 | elapsed time per iteration (ms): 122961.0 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.677049E+00 | loss scale: 65536.0 | grad norm: 93295.268 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.58 | iteration 16042/ 292968 | consumed samples: 32854016 | consumed tokens: 16495460352 | elapsed time per iteration (ms): 122642.8 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.676794E+00 | loss scale: 65536.0 | grad norm: 34648.856 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.84 | iteration 16043/ 292968 | consumed samples: 32856064 | consumed tokens: 16497393664 | elapsed time per iteration (ms): 122064.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.687354E+00 | loss scale: 65536.0 | grad norm: 78181.893 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.33 | iteration 16044/ 292968 | consumed samples: 32858112 | consumed tokens: 16499326976 | elapsed time per iteration (ms): 122722.1 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.706172E+00 | loss scale: 65536.0 | grad norm: 62033.609 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.78 | iteration 16045/ 292968 | consumed samples: 32860160 | consumed tokens: 16501260288 | elapsed time per iteration (ms): 122915.9 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.674217E+00 | loss scale: 65536.0 | grad norm: 63517.974 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.62 | iteration 16046/ 292968 | consumed samples: 32862208 | consumed tokens: 16503193600 | elapsed time per iteration (ms): 125236.6 | learning rate: 5.947E-05 | global batch size: 2048 | lm loss: 2.679890E+00 | loss scale: 65536.0 | grad norm: 58026.029 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.74 | iteration 16047/ 292968 | consumed samples: 32864256 | consumed tokens: 16505126912 | elapsed time per iteration (ms): 123676.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.686498E+00 | loss scale: 65536.0 | grad norm: 59806.230 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.99 | iteration 16048/ 292968 | consumed samples: 32866304 | consumed tokens: 16507060224 | elapsed time per iteration (ms): 125091.2 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.708706E+00 | loss scale: 65536.0 | grad norm: 61190.805 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.85 | iteration 16049/ 292968 | consumed samples: 32868352 | consumed tokens: 16508993536 | elapsed time per iteration (ms): 122612.6 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.678390E+00 | loss scale: 65536.0 | grad norm: 63003.127 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.87 | iteration 16050/ 292968 | consumed samples: 32870400 | consumed tokens: 16510926848 | elapsed time per iteration (ms): 122972.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.687453E+00 | loss scale: 65536.0 | grad norm: 76067.633 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.57 | ------------------------------------------------------------------------------------------- valid loss at iteration 16050 | lm loss value: 3.299144E+00 | lm loss PPL: 2.708943E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 16050 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-04 22:33:49,205] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/mp_rank_01_model_states.pt [2022-02-04 22:33:49,439] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/mp_rank_00_model_states.pt [2022-02-04 22:36:12,173] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-04 22:36:18,570] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-04 22:36:23,195] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-04 22:36:24,083] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-04 22:36:26,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-04 22:36:26,277] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-04 22:36:26,947] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-04 22:36:27,012] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-04 22:36:27,182] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-04 22:36:27,111] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-04 22:36:27,485] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-04 22:36:27,538] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-04 22:36:28,535] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-04 22:36:29,580] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-04 22:36:29,693] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-04 22:36:29,699] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-04 22:36:29,710] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-04 22:36:30,079] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-04 22:36:30,219] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-04 22:36:30,582] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-04 22:36:30,731] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-04 22:36:30,986] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-04 22:36:30,839] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-04 22:36:31,097] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-04 22:36:31,235] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-04 22:36:31,421] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-04 22:36:31,551] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-04 22:36:31,716] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-04 22:36:31,758] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-04 22:36:31,784] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-04 22:36:31,936] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-04 22:36:33,201] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-04 22:36:33,263] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-04 22:36:33,958] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-04 22:36:33,857] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-04 22:36:34,228] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-04 22:36:34,459] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-04 22:36:35,206] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-04 22:36:35,594] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-04 22:36:35,586] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-04 22:36:36,040] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-04 22:36:36,054] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-04 22:36:36,464] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-04 22:36:36,533] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-04 22:36:36,538] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-04 22:36:36,552] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-04 22:36:36,886] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-04 22:36:36,928] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-04 22:36:36,949] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-04 22:36:37,042] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-04 22:36:37,422] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-04 22:36:37,463] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-04 22:36:37,576] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-04 22:36:37,753] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-04 22:36:37,770] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-04 22:36:38,011] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-04 22:36:38,183] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-04 22:36:38,259] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-04 22:36:38,275] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-04 22:36:38,759] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-04 22:36:38,880] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-04 22:36:39,772] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-04 22:36:39,841] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-04 22:36:40,422] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-04 22:36:40,464] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-04 22:36:40,942] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-04 22:36:41,179] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-04 22:36:41,190] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-04 22:36:41,273] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-04 22:36:41,350] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-04 22:36:41,306] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-04 22:36:41,477] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-04 22:36:41,494] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-04 22:36:41,534] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-04 22:36:41,543] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-04 22:36:41,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-04 22:36:41,611] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-04 22:36:41,617] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-04 22:36:41,682] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-04 22:36:41,718] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-04 22:36:41,858] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-04 22:36:42,534] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-04 22:36:42,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-04 22:36:43,032] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-04 22:36:43,227] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-04 22:36:43,275] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-04 22:36:44,022] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-04 22:36:44,659] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-04 22:36:44,688] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-04 22:36:44,745] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-04 22:36:44,767] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-04 22:36:44,817] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-04 22:36:44,841] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-04 22:36:45,462] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-04 22:36:45,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-04 22:36:45,982] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-04 22:36:46,039] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-04 22:36:46,052] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-04 22:36:46,096] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-04 22:36:46,337] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-04 22:36:46,391] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-04 22:36:46,448] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-04 22:36:46,635] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-04 22:36:46,683] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-04 22:36:46,686] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-04 22:36:47,137] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-04 22:36:47,271] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-04 22:36:48,163] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-04 22:36:53,302] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-04 22:36:53,420] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-04 22:36:54,103] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-04 22:36:54,917] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-04 22:36:55,439] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-04 22:36:55,481] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-04 22:36:55,733] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-04 22:36:57,236] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-04 22:36:57,265] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-04 22:36:57,478] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-04 22:37:06,673] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-04 22:37:09,090] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-04 22:37:17,445] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-04 22:37:20,065] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-04 22:37:27,861] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-04 22:37:30,783] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-04 22:37:32,093] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-04 22:37:32,361] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-04 22:37:38,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-04 22:37:48,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16050/zero_pp_rank_0_mp_rank_39_optim_states.pt successfully saved checkpoint at iteration 16050 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 269002.11 iteration 16051/ 292968 | consumed samples: 32872448 | consumed tokens: 16512860160 | elapsed time per iteration (ms): 807610.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.690274E+00 | loss scale: 65536.0 | grad norm: 65807.894 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 15.47 | iteration 16052/ 292968 | consumed samples: 32874496 | consumed tokens: 16514793472 | elapsed time per iteration (ms): 151168.1 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.685056E+00 | loss scale: 65536.0 | grad norm: 54622.348 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 82.63 | iteration 16053/ 292968 | consumed samples: 32876544 | consumed tokens: 16516726784 | elapsed time per iteration (ms): 136740.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.688634E+00 | loss scale: 65536.0 | grad norm: 57342.678 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.34 | iteration 16054/ 292968 | consumed samples: 32878592 | consumed tokens: 16518660096 | elapsed time per iteration (ms): 136291.1 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.675601E+00 | loss scale: 65536.0 | grad norm: 46251.102 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.65 | iteration 16055/ 292968 | consumed samples: 32880640 | consumed tokens: 16520593408 | elapsed time per iteration (ms): 130990.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.686143E+00 | loss scale: 65536.0 | grad norm: 50767.304 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.35 | iteration 16056/ 292968 | consumed samples: 32882688 | consumed tokens: 16522526720 | elapsed time per iteration (ms): 129905.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.703512E+00 | loss scale: 65536.0 | grad norm: 58150.280 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.15 | iteration 16057/ 292968 | consumed samples: 32884736 | consumed tokens: 16524460032 | elapsed time per iteration (ms): 130487.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.676800E+00 | loss scale: 65536.0 | grad norm: 50755.365 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.72 | iteration 16058/ 292968 | consumed samples: 32886784 | consumed tokens: 16526393344 | elapsed time per iteration (ms): 126561.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.665383E+00 | loss scale: 65536.0 | grad norm: 55368.453 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.69 | iteration 16059/ 292968 | consumed samples: 32888832 | consumed tokens: 16528326656 | elapsed time per iteration (ms): 126662.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.685134E+00 | loss scale: 65536.0 | grad norm: 61971.483 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.61 | iteration 16060/ 292968 | consumed samples: 32890880 | consumed tokens: 16530259968 | elapsed time per iteration (ms): 124113.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.666243E+00 | loss scale: 65536.0 | grad norm: 48382.308 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.64 | iteration 16061/ 292968 | consumed samples: 32892928 | consumed tokens: 16532193280 | elapsed time per iteration (ms): 123839.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.665228E+00 | loss scale: 65536.0 | grad norm: 46535.303 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.86 | iteration 16062/ 292968 | consumed samples: 32894976 | consumed tokens: 16534126592 | elapsed time per iteration (ms): 123506.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.694000E+00 | loss scale: 65536.0 | grad norm: 53344.350 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.13 | iteration 16063/ 292968 | consumed samples: 32897024 | consumed tokens: 16536059904 | elapsed time per iteration (ms): 131206.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.679291E+00 | loss scale: 65536.0 | grad norm: 59879.481 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.20 | iteration 16064/ 292968 | consumed samples: 32899072 | consumed tokens: 16537993216 | elapsed time per iteration (ms): 130990.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.678590E+00 | loss scale: 65536.0 | grad norm: 48773.367 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.35 | iteration 16065/ 292968 | consumed samples: 32901120 | consumed tokens: 16539926528 | elapsed time per iteration (ms): 129769.5 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.678338E+00 | loss scale: 65536.0 | grad norm: 33512.270 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.25 | iteration 16066/ 292968 | consumed samples: 32903168 | consumed tokens: 16541859840 | elapsed time per iteration (ms): 128882.5 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.675675E+00 | loss scale: 65536.0 | grad norm: 40990.010 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.91 | iteration 16067/ 292968 | consumed samples: 32905216 | consumed tokens: 16543793152 | elapsed time per iteration (ms): 129697.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.710522E+00 | loss scale: 65536.0 | grad norm: 64875.678 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.30 | iteration 16068/ 292968 | consumed samples: 32907264 | consumed tokens: 16545726464 | elapsed time per iteration (ms): 129679.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.670231E+00 | loss scale: 65536.0 | grad norm: 74692.435 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.32 | iteration 16069/ 292968 | consumed samples: 32909312 | consumed tokens: 16547659776 | elapsed time per iteration (ms): 128183.7 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.662552E+00 | loss scale: 65536.0 | grad norm: 55737.103 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.44 | iteration 16070/ 292968 | consumed samples: 32911360 | consumed tokens: 16549593088 | elapsed time per iteration (ms): 125367.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.689342E+00 | loss scale: 65536.0 | grad norm: 63670.419 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.63 | iteration 16071/ 292968 | consumed samples: 32913408 | consumed tokens: 16551526400 | elapsed time per iteration (ms): 122505.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.679252E+00 | loss scale: 65536.0 | grad norm: 67769.619 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.96 | iteration 16072/ 292968 | consumed samples: 32915456 | consumed tokens: 16553459712 | elapsed time per iteration (ms): 124040.6 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.670770E+00 | loss scale: 65536.0 | grad norm: 42682.435 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.70 | iteration 16073/ 292968 | consumed samples: 32917504 | consumed tokens: 16555393024 | elapsed time per iteration (ms): 121569.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.702013E+00 | loss scale: 65536.0 | grad norm: 44101.114 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.74 | iteration 16074/ 292968 | consumed samples: 32919552 | consumed tokens: 16557326336 | elapsed time per iteration (ms): 122761.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.655235E+00 | loss scale: 65536.0 | grad norm: 46232.772 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.75 | iteration 16075/ 292968 | consumed samples: 32921600 | consumed tokens: 16559259648 | elapsed time per iteration (ms): 122446.6 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.681386E+00 | loss scale: 65536.0 | grad norm: 49847.502 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.01 | iteration 16076/ 292968 | consumed samples: 32923648 | consumed tokens: 16561192960 | elapsed time per iteration (ms): 122408.2 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.690196E+00 | loss scale: 65536.0 | grad norm: 69059.340 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.04 | iteration 16077/ 292968 | consumed samples: 32925696 | consumed tokens: 16563126272 | elapsed time per iteration (ms): 123778.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.680676E+00 | loss scale: 65536.0 | grad norm: 58314.480 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.91 | iteration 16078/ 292968 | consumed samples: 32927744 | consumed tokens: 16565059584 | elapsed time per iteration (ms): 122151.6 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.691298E+00 | loss scale: 65536.0 | grad norm: 65009.765 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.25 | iteration 16079/ 292968 | consumed samples: 32929792 | consumed tokens: 16566992896 | elapsed time per iteration (ms): 126216.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.687706E+00 | loss scale: 65536.0 | grad norm: 66590.209 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.96 | iteration 16080/ 292968 | consumed samples: 32931840 | consumed tokens: 16568926208 | elapsed time per iteration (ms): 123011.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.677490E+00 | loss scale: 65536.0 | grad norm: 56837.698 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.54 | iteration 16081/ 292968 | consumed samples: 32933888 | consumed tokens: 16570859520 | elapsed time per iteration (ms): 122637.7 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.672918E+00 | loss scale: 65536.0 | grad norm: 51190.087 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.85 | iteration 16082/ 292968 | consumed samples: 32935936 | consumed tokens: 16572792832 | elapsed time per iteration (ms): 123149.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.681258E+00 | loss scale: 65536.0 | grad norm: 37733.198 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.43 | iteration 16083/ 292968 | consumed samples: 32937984 | consumed tokens: 16574726144 | elapsed time per iteration (ms): 125315.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.654139E+00 | loss scale: 65536.0 | grad norm: 54249.650 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.67 | iteration 16084/ 292968 | consumed samples: 32940032 | consumed tokens: 16576659456 | elapsed time per iteration (ms): 123167.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.677903E+00 | loss scale: 65536.0 | grad norm: 73099.471 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.41 | iteration 16085/ 292968 | consumed samples: 32942080 | consumed tokens: 16578592768 | elapsed time per iteration (ms): 124044.2 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.711192E+00 | loss scale: 65536.0 | grad norm: 41224.998 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.69 | iteration 16086/ 292968 | consumed samples: 32944128 | consumed tokens: 16580526080 | elapsed time per iteration (ms): 122828.6 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.678984E+00 | loss scale: 65536.0 | grad norm: 48782.495 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.69 | iteration 16087/ 292968 | consumed samples: 32946176 | consumed tokens: 16582459392 | elapsed time per iteration (ms): 123527.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.694522E+00 | loss scale: 65536.0 | grad norm: 48211.083 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.12 | iteration 16088/ 292968 | consumed samples: 32948224 | consumed tokens: 16584392704 | elapsed time per iteration (ms): 122336.2 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.702344E+00 | loss scale: 65536.0 | grad norm: 48252.782 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.10 | iteration 16089/ 292968 | consumed samples: 32950272 | consumed tokens: 16586326016 | elapsed time per iteration (ms): 123446.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.679575E+00 | loss scale: 65536.0 | grad norm: 50320.615 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.18 | iteration 16090/ 292968 | consumed samples: 32952320 | consumed tokens: 16588259328 | elapsed time per iteration (ms): 127168.7 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.699792E+00 | loss scale: 65536.0 | grad norm: 58032.469 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.22 | iteration 16091/ 292968 | consumed samples: 32954368 | consumed tokens: 16590192640 | elapsed time per iteration (ms): 125780.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.681907E+00 | loss scale: 65536.0 | grad norm: 62001.217 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.30 | iteration 16092/ 292968 | consumed samples: 32956416 | consumed tokens: 16592125952 | elapsed time per iteration (ms): 123800.1 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.696507E+00 | loss scale: 65536.0 | grad norm: 77630.950 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.89 | iteration 16093/ 292968 | consumed samples: 32958464 | consumed tokens: 16594059264 | elapsed time per iteration (ms): 123274.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.697795E+00 | loss scale: 65536.0 | grad norm: 83720.158 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.32 | iteration 16094/ 292968 | consumed samples: 32960512 | consumed tokens: 16595992576 | elapsed time per iteration (ms): 127334.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.681477E+00 | loss scale: 65536.0 | grad norm: 50229.540 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.09 | iteration 16095/ 292968 | consumed samples: 32962560 | consumed tokens: 16597925888 | elapsed time per iteration (ms): 123620.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.684816E+00 | loss scale: 65536.0 | grad norm: 46099.938 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.04 | iteration 16096/ 292968 | consumed samples: 32964608 | consumed tokens: 16599859200 | elapsed time per iteration (ms): 124080.1 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.692239E+00 | loss scale: 65536.0 | grad norm: 74424.546 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.67 | iteration 16097/ 292968 | consumed samples: 32966656 | consumed tokens: 16601792512 | elapsed time per iteration (ms): 124024.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.685067E+00 | loss scale: 65536.0 | grad norm: 73121.831 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 100.71 | iteration 16098/ 292968 | consumed samples: 32968704 | consumed tokens: 16603725824 | elapsed time per iteration (ms): 124152.5 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.690224E+00 | loss scale: 65536.0 | grad norm: 37978.962 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.61 | iteration 16099/ 292968 | consumed samples: 32970752 | consumed tokens: 16605659136 | elapsed time per iteration (ms): 125655.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.674048E+00 | loss scale: 65536.0 | grad norm: 75933.162 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.40 | iteration 16100/ 292968 | consumed samples: 32972800 | consumed tokens: 16607592448 | elapsed time per iteration (ms): 124387.1 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.673286E+00 | loss scale: 65536.0 | grad norm: 72952.150 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.42 | saving checkpoint at iteration 16100 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 00:24:03,875] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/mp_rank_00_model_states.pt [2022-02-05 00:24:04,105] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/mp_rank_01_model_states.pt [2022-02-05 00:28:10,877] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 00:28:11,272] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 00:28:11,447] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 00:28:11,663] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 00:28:11,780] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 00:28:12,032] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 00:28:12,042] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 00:28:12,119] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 00:28:13,983] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 00:28:14,235] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 00:28:14,363] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 00:28:14,431] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 00:28:14,436] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 00:28:14,508] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 00:28:14,910] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 00:28:15,215] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 00:28:15,266] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 00:28:16,078] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 00:28:16,204] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 00:28:16,268] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 00:28:16,497] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 00:28:16,628] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 00:28:16,635] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 00:28:17,204] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 00:28:17,377] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 00:28:17,695] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 00:28:17,730] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 00:28:17,774] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 00:28:17,751] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 00:28:18,761] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 00:28:18,829] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 00:28:19,124] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 00:28:19,242] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 00:28:19,410] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 00:28:19,410] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 00:28:19,712] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 00:28:20,362] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 00:28:20,417] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 00:28:20,426] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 00:28:21,055] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 00:28:21,151] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 00:28:21,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 00:28:22,066] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 00:28:22,862] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 00:28:22,867] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 00:28:22,906] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 00:28:22,995] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 00:28:23,231] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 00:28:23,304] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 00:28:23,350] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 00:28:23,413] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 00:28:24,580] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 00:28:24,719] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 00:28:24,814] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 00:28:25,336] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 00:28:25,630] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 00:28:25,701] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 00:28:25,726] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 00:28:25,817] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 00:28:26,360] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 00:28:26,541] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 00:28:26,885] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 00:28:27,083] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 00:28:27,154] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 00:28:27,837] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 00:28:28,121] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 00:28:28,249] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 00:28:28,502] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 00:28:28,600] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 00:28:28,664] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 00:28:28,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 00:28:31,602] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 00:28:32,605] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 00:28:33,366] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 00:28:33,466] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 00:28:33,470] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 00:28:33,639] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 00:28:35,185] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 00:28:35,209] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 00:28:35,910] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 00:28:37,634] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 00:28:37,833] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 00:28:37,840] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 00:28:38,219] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 00:28:38,261] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 00:28:38,511] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 00:28:38,532] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 00:28:38,623] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 00:28:39,004] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 00:28:39,034] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 00:28:39,412] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 00:28:39,588] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 00:28:39,630] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 00:28:50,382] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 00:28:50,979] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 00:28:51,166] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 00:28:51,246] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 00:28:51,290] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 00:28:51,410] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 00:28:51,688] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 00:29:01,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 00:29:01,677] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 00:29:01,716] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 00:29:02,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 00:29:02,585] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 00:29:02,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 00:29:03,477] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 00:29:12,909] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 00:29:13,222] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 00:29:13,406] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 00:29:13,555] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 00:29:13,738] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 00:29:13,757] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 00:29:13,811] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 00:29:24,556] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 00:29:24,682] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 00:29:25,038] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 00:29:25,226] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 00:29:25,685] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 00:29:25,755] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 00:29:25,964] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 00:29:36,525] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 00:29:36,857] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 00:29:37,033] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 00:29:37,206] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 00:29:37,297] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 00:29:37,792] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 00:29:37,965] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16100/zero_pp_rank_0_mp_rank_67_optim_states.pt successfully saved checkpoint at iteration 16100 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 366249.68 iteration 16101/ 292968 | consumed samples: 32974848 | consumed tokens: 16609525760 | elapsed time per iteration (ms): 500131.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.691069E+00 | loss scale: 65536.0 | grad norm: 48657.221 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.004 | TFLOPs: 24.97 | iteration 16102/ 292968 | consumed samples: 32976896 | consumed tokens: 16611459072 | elapsed time per iteration (ms): 136071.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.698471E+00 | loss scale: 65536.0 | grad norm: 54083.491 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.79 | iteration 16103/ 292968 | consumed samples: 32978944 | consumed tokens: 16613392384 | elapsed time per iteration (ms): 132202.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.690155E+00 | loss scale: 65536.0 | grad norm: 55614.577 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.48 | iteration 16104/ 292968 | consumed samples: 32980992 | consumed tokens: 16615325696 | elapsed time per iteration (ms): 130341.6 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.683866E+00 | loss scale: 65536.0 | grad norm: 39335.073 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 95.83 | iteration 16105/ 292968 | consumed samples: 32983040 | consumed tokens: 16617259008 | elapsed time per iteration (ms): 128637.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.695318E+00 | loss scale: 65536.0 | grad norm: 48488.531 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.10 | iteration 16106/ 292968 | consumed samples: 32985088 | consumed tokens: 16619192320 | elapsed time per iteration (ms): 128042.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.689407E+00 | loss scale: 65536.0 | grad norm: 59132.678 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.55 | iteration 16107/ 292968 | consumed samples: 32987136 | consumed tokens: 16621125632 | elapsed time per iteration (ms): 126412.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.660828E+00 | loss scale: 65536.0 | grad norm: 44719.807 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.81 | iteration 16108/ 292968 | consumed samples: 32989184 | consumed tokens: 16623058944 | elapsed time per iteration (ms): 127229.1 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.665599E+00 | loss scale: 65536.0 | grad norm: 35838.018 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.17 | iteration 16109/ 292968 | consumed samples: 32991232 | consumed tokens: 16624992256 | elapsed time per iteration (ms): 126035.4 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.679210E+00 | loss scale: 65536.0 | grad norm: 50272.577 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.10 | iteration 16110/ 292968 | consumed samples: 32993280 | consumed tokens: 16626925568 | elapsed time per iteration (ms): 126710.7 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.678720E+00 | loss scale: 65536.0 | grad norm: 59826.420 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.58 | iteration 16111/ 292968 | consumed samples: 32995328 | consumed tokens: 16628858880 | elapsed time per iteration (ms): 126781.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.692469E+00 | loss scale: 65536.0 | grad norm: 68455.430 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.52 | iteration 16112/ 292968 | consumed samples: 32997376 | consumed tokens: 16630792192 | elapsed time per iteration (ms): 125364.8 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.668903E+00 | loss scale: 65536.0 | grad norm: 63514.680 | num zeros: 0.0 | curriculum seqlen: 944 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.63 | iteration 16113/ 292968 | consumed samples: 32999424 | consumed tokens: 16632741888 | elapsed time per iteration (ms): 125879.9 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.662245E+00 | loss scale: 65536.0 | grad norm: 72451.931 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.07 | iteration 16114/ 292968 | consumed samples: 33001472 | consumed tokens: 16634691584 | elapsed time per iteration (ms): 128749.7 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.660008E+00 | loss scale: 65536.0 | grad norm: 59895.164 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.84 | iteration 16115/ 292968 | consumed samples: 33003520 | consumed tokens: 16636641280 | elapsed time per iteration (ms): 127503.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.672443E+00 | loss scale: 65536.0 | grad norm: 64291.317 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.79 | iteration 16116/ 292968 | consumed samples: 33005568 | consumed tokens: 16638590976 | elapsed time per iteration (ms): 126427.5 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.659560E+00 | loss scale: 65536.0 | grad norm: 67012.708 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.63 | iteration 16117/ 292968 | consumed samples: 33007616 | consumed tokens: 16640540672 | elapsed time per iteration (ms): 126467.3 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.684789E+00 | loss scale: 65536.0 | grad norm: 51935.766 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.60 | iteration 16118/ 292968 | consumed samples: 33009664 | consumed tokens: 16642490368 | elapsed time per iteration (ms): 127643.2 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.692091E+00 | loss scale: 65536.0 | grad norm: 63088.443 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.68 | iteration 16119/ 292968 | consumed samples: 33011712 | consumed tokens: 16644440064 | elapsed time per iteration (ms): 128173.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.666381E+00 | loss scale: 65536.0 | grad norm: 75613.371 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.28 | iteration 16120/ 292968 | consumed samples: 33013760 | consumed tokens: 16646389760 | elapsed time per iteration (ms): 126475.1 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.682797E+00 | loss scale: 65536.0 | grad norm: 41262.820 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.60 | iteration 16121/ 292968 | consumed samples: 33015808 | consumed tokens: 16648339456 | elapsed time per iteration (ms): 127353.7 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.647893E+00 | loss scale: 65536.0 | grad norm: 65124.374 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.91 | iteration 16122/ 292968 | consumed samples: 33017856 | consumed tokens: 16650289152 | elapsed time per iteration (ms): 123638.7 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.693524E+00 | loss scale: 65536.0 | grad norm: 77523.861 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.88 | iteration 16123/ 292968 | consumed samples: 33019904 | consumed tokens: 16652238848 | elapsed time per iteration (ms): 125113.0 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.683824E+00 | loss scale: 65536.0 | grad norm: 25728.197 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.68 | iteration 16124/ 292968 | consumed samples: 33021952 | consumed tokens: 16654188544 | elapsed time per iteration (ms): 123488.5 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.698336E+00 | loss scale: 65536.0 | grad norm: 66297.452 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.00 | iteration 16125/ 292968 | consumed samples: 33024000 | consumed tokens: 16656138240 | elapsed time per iteration (ms): 123300.5 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.673517E+00 | loss scale: 65536.0 | grad norm: 86777.390 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.16 | iteration 16126/ 292968 | consumed samples: 33026048 | consumed tokens: 16658087936 | elapsed time per iteration (ms): 127053.2 | learning rate: 5.946E-05 | global batch size: 2048 | lm loss: 2.666025E+00 | loss scale: 65536.0 | grad norm: 33816.713 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.14 | iteration 16127/ 292968 | consumed samples: 33028096 | consumed tokens: 16660037632 | elapsed time per iteration (ms): 124653.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.691122E+00 | loss scale: 65536.0 | grad norm: 96356.252 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.05 | iteration 16128/ 292968 | consumed samples: 33030144 | consumed tokens: 16661987328 | elapsed time per iteration (ms): 123100.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.666359E+00 | loss scale: 65536.0 | grad norm: 31419.740 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.33 | iteration 16129/ 292968 | consumed samples: 33032192 | consumed tokens: 16663937024 | elapsed time per iteration (ms): 124103.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.680783E+00 | loss scale: 65536.0 | grad norm: 67258.567 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.50 | iteration 16130/ 292968 | consumed samples: 33034240 | consumed tokens: 16665886720 | elapsed time per iteration (ms): 125528.7 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.691964E+00 | loss scale: 65536.0 | grad norm: 70624.635 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.35 | iteration 16131/ 292968 | consumed samples: 33036288 | consumed tokens: 16667836416 | elapsed time per iteration (ms): 123643.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.692385E+00 | loss scale: 65536.0 | grad norm: 64425.402 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.88 | iteration 16132/ 292968 | consumed samples: 33038336 | consumed tokens: 16669786112 | elapsed time per iteration (ms): 126843.8 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.696510E+00 | loss scale: 65536.0 | grad norm: 102963.547 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.31 | iteration 16133/ 292968 | consumed samples: 33040384 | consumed tokens: 16671735808 | elapsed time per iteration (ms): 124851.8 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.693777E+00 | loss scale: 65536.0 | grad norm: 42218.816 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.89 | iteration 16134/ 292968 | consumed samples: 33042432 | consumed tokens: 16673685504 | elapsed time per iteration (ms): 123814.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.710939E+00 | loss scale: 65536.0 | grad norm: 125296.290 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.74 | iteration 16135/ 292968 | consumed samples: 33044480 | consumed tokens: 16675635200 | elapsed time per iteration (ms): 123714.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.648367E+00 | loss scale: 65536.0 | grad norm: 53437.153 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.82 | iteration 16136/ 292968 | consumed samples: 33046528 | consumed tokens: 16677584896 | elapsed time per iteration (ms): 123519.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.715935E+00 | loss scale: 65536.0 | grad norm: 106059.275 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.98 | iteration 16137/ 292968 | consumed samples: 33048576 | consumed tokens: 16679534592 | elapsed time per iteration (ms): 123671.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.694496E+00 | loss scale: 65536.0 | grad norm: 68455.103 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.85 | iteration 16138/ 292968 | consumed samples: 33050624 | consumed tokens: 16681484288 | elapsed time per iteration (ms): 125371.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.700497E+00 | loss scale: 65536.0 | grad norm: 106133.403 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.47 | iteration 16139/ 292968 | consumed samples: 33052672 | consumed tokens: 16683433984 | elapsed time per iteration (ms): 125579.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.713696E+00 | loss scale: 65536.0 | grad norm: 80769.805 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.31 | iteration 16140/ 292968 | consumed samples: 33054720 | consumed tokens: 16685383680 | elapsed time per iteration (ms): 125882.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.678824E+00 | loss scale: 65536.0 | grad norm: 55561.698 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.06 | iteration 16141/ 292968 | consumed samples: 33056768 | consumed tokens: 16687333376 | elapsed time per iteration (ms): 124590.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.687184E+00 | loss scale: 65536.0 | grad norm: 69153.040 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.10 | iteration 16142/ 292968 | consumed samples: 33058816 | consumed tokens: 16689283072 | elapsed time per iteration (ms): 124213.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.715583E+00 | loss scale: 65536.0 | grad norm: 57029.302 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.41 | iteration 16143/ 292968 | consumed samples: 33060864 | consumed tokens: 16691232768 | elapsed time per iteration (ms): 122467.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 3.048000E+00 | loss scale: 65536.0 | grad norm: 601591.650 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.86 | [2022-02-05 02:02:12,767] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 65536.0 iteration 16144/ 292968 | consumed samples: 33062912 | consumed tokens: 16693182464 | elapsed time per iteration (ms): 124276.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 3.532701E+00 | loss scale: 65536.0 | grad norm: 601591.650 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.36 | [2022-02-05 02:04:15,885] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 iteration 16145/ 292968 | consumed samples: 33064960 | consumed tokens: 16695132160 | elapsed time per iteration (ms): 123118.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 3.539801E+00 | loss scale: 32768.0 | grad norm: 601591.650 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.31 | iteration 16146/ 292968 | consumed samples: 33067008 | consumed tokens: 16697081856 | elapsed time per iteration (ms): 122782.1 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 3.530910E+00 | loss scale: 32768.0 | grad norm: 595093.129 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.59 | iteration 16147/ 292968 | consumed samples: 33069056 | consumed tokens: 16699031552 | elapsed time per iteration (ms): 122740.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.882886E+00 | loss scale: 32768.0 | grad norm: 91342.448 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.63 | iteration 16148/ 292968 | consumed samples: 33071104 | consumed tokens: 16700981248 | elapsed time per iteration (ms): 122327.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.753979E+00 | loss scale: 32768.0 | grad norm: 34059.760 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.97 | iteration 16149/ 292968 | consumed samples: 33073152 | consumed tokens: 16702930944 | elapsed time per iteration (ms): 122202.7 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.738262E+00 | loss scale: 32768.0 | grad norm: 38525.071 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 103.08 | iteration 16150/ 292968 | consumed samples: 33075200 | consumed tokens: 16704880640 | elapsed time per iteration (ms): 122241.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.748955E+00 | loss scale: 32768.0 | grad norm: 35774.647 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 103.05 | saving checkpoint at iteration 16150 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 02:17:03,068] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/mp_rank_01_model_states.pt [2022-02-05 02:17:04,130] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/mp_rank_00_model_states.pt [2022-02-05 02:38:32,882] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 02:38:41,775] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 02:38:45,459] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 02:38:45,670] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 02:38:45,816] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 02:38:47,560] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 02:38:47,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 02:38:47,699] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 02:38:47,952] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 02:38:47,985] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 02:38:48,030] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 02:38:48,051] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 02:38:48,118] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 02:38:48,474] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 02:38:48,893] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 02:38:49,167] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 02:38:49,231] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 02:38:49,924] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 02:38:49,976] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 02:38:50,472] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 02:38:50,883] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 02:38:51,048] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 02:38:51,178] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 02:38:51,309] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 02:38:51,738] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 02:38:51,754] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 02:38:51,911] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 02:38:52,228] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 02:38:52,295] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 02:38:52,355] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 02:38:52,405] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 02:38:52,767] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 02:38:52,848] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 02:38:52,893] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 02:38:53,093] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 02:38:53,105] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 02:38:53,165] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 02:38:53,313] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 02:38:53,398] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 02:38:53,462] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 02:38:53,522] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 02:38:53,990] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 02:38:54,315] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 02:38:54,368] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 02:38:54,371] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 02:38:54,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 02:38:54,455] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 02:38:54,588] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 02:38:54,660] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 02:38:54,940] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 02:38:55,024] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 02:38:54,903] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 02:38:55,550] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 02:38:55,867] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 02:38:55,900] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 02:38:56,438] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 02:38:56,635] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 02:38:56,705] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 02:38:56,713] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 02:38:56,771] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 02:38:57,378] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 02:38:57,494] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 02:38:57,465] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 02:38:57,575] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 02:38:57,886] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 02:38:58,166] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 02:38:58,279] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 02:38:58,306] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 02:38:59,023] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 02:38:59,061] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 02:38:59,387] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 02:38:59,521] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 02:38:59,534] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 02:38:59,579] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 02:39:00,312] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 02:39:00,316] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 02:39:00,217] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 02:39:00,455] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 02:39:00,500] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 02:39:00,708] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 02:39:00,926] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 02:39:00,959] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 02:39:01,262] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 02:39:01,556] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 02:39:01,620] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 02:39:02,374] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 02:39:02,407] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 02:39:02,589] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 02:39:02,946] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 02:39:03,567] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 02:39:05,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 02:39:06,108] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 02:39:06,180] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 02:39:06,206] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 02:39:06,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 02:39:06,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 02:39:07,760] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 02:39:07,883] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 02:39:41,286] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 02:39:46,577] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 02:39:48,910] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 02:40:11,830] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 02:40:12,129] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 02:40:12,181] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 02:40:12,381] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 02:40:13,717] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 02:40:13,902] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 02:40:48,914] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 02:40:55,175] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 02:40:57,273] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 02:40:58,203] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 02:41:45,840] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 02:41:47,035] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 02:41:52,646] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 02:41:52,717] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 02:41:52,816] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 02:41:52,880] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 02:41:54,193] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 02:41:54,213] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 02:42:26,187] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 02:42:27,318] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 02:42:33,828] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 02:42:34,282] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 02:42:34,423] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 02:42:34,706] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 02:42:42,326] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 02:42:55,644] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 02:42:55,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16150/zero_pp_rank_0_mp_rank_119_optim_states.pt successfully saved checkpoint at iteration 16150 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 1707580.32 iteration 16151/ 292968 | consumed samples: 33077248 | consumed tokens: 16706830336 | elapsed time per iteration (ms): 1849363.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.714864E+00 | loss scale: 32768.0 | grad norm: 27696.254 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.001 | TFLOPs: 6.81 | iteration 16152/ 292968 | consumed samples: 33079296 | consumed tokens: 16708780032 | elapsed time per iteration (ms): 136138.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.742652E+00 | loss scale: 32768.0 | grad norm: 34899.731 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 92.53 | iteration 16153/ 292968 | consumed samples: 33081344 | consumed tokens: 16710729728 | elapsed time per iteration (ms): 132648.7 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.721860E+00 | loss scale: 32768.0 | grad norm: 25331.596 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.96 | iteration 16154/ 292968 | consumed samples: 33083392 | consumed tokens: 16712679424 | elapsed time per iteration (ms): 138167.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.714587E+00 | loss scale: 32768.0 | grad norm: 40114.859 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 91.17 | iteration 16155/ 292968 | consumed samples: 33085440 | consumed tokens: 16714629120 | elapsed time per iteration (ms): 130958.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.697772E+00 | loss scale: 32768.0 | grad norm: 24243.523 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.19 | iteration 16156/ 292968 | consumed samples: 33087488 | consumed tokens: 16716578816 | elapsed time per iteration (ms): 132649.1 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.763044E+00 | loss scale: 32768.0 | grad norm: 38883.663 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.96 | iteration 16157/ 292968 | consumed samples: 33089536 | consumed tokens: 16718528512 | elapsed time per iteration (ms): 129635.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.751317E+00 | loss scale: 32768.0 | grad norm: 33800.791 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.17 | iteration 16158/ 292968 | consumed samples: 33091584 | consumed tokens: 16720478208 | elapsed time per iteration (ms): 130554.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.793861E+00 | loss scale: 32768.0 | grad norm: 57662.214 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.48 | iteration 16159/ 292968 | consumed samples: 33093632 | consumed tokens: 16722427904 | elapsed time per iteration (ms): 128790.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.900736E+00 | loss scale: 32768.0 | grad norm: 94337.612 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.81 | iteration 16160/ 292968 | consumed samples: 33095680 | consumed tokens: 16724377600 | elapsed time per iteration (ms): 130314.8 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.784914E+00 | loss scale: 32768.0 | grad norm: 25132.536 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.66 | iteration 16161/ 292968 | consumed samples: 33097728 | consumed tokens: 16726327296 | elapsed time per iteration (ms): 127557.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.749031E+00 | loss scale: 32768.0 | grad norm: 40783.903 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.75 | iteration 16162/ 292968 | consumed samples: 33099776 | consumed tokens: 16728276992 | elapsed time per iteration (ms): 126555.1 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.740225E+00 | loss scale: 32768.0 | grad norm: 25606.683 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.53 | iteration 16163/ 292968 | consumed samples: 33101824 | consumed tokens: 16730226688 | elapsed time per iteration (ms): 129581.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.770096E+00 | loss scale: 32768.0 | grad norm: 31812.461 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.21 | iteration 16164/ 292968 | consumed samples: 33103872 | consumed tokens: 16732176384 | elapsed time per iteration (ms): 128095.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.741090E+00 | loss scale: 32768.0 | grad norm: 50895.760 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.34 | iteration 16165/ 292968 | consumed samples: 33105920 | consumed tokens: 16734126080 | elapsed time per iteration (ms): 128281.7 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.739000E+00 | loss scale: 32768.0 | grad norm: 29029.449 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.19 | iteration 16166/ 292968 | consumed samples: 33107968 | consumed tokens: 16736075776 | elapsed time per iteration (ms): 129568.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.739569E+00 | loss scale: 32768.0 | grad norm: 54871.963 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.22 | iteration 16167/ 292968 | consumed samples: 33110016 | consumed tokens: 16738025472 | elapsed time per iteration (ms): 126023.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.701830E+00 | loss scale: 32768.0 | grad norm: 30856.832 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.95 | iteration 16168/ 292968 | consumed samples: 33112064 | consumed tokens: 16739975168 | elapsed time per iteration (ms): 128150.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.714375E+00 | loss scale: 32768.0 | grad norm: 38329.125 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.29 | iteration 16169/ 292968 | consumed samples: 33114112 | consumed tokens: 16741924864 | elapsed time per iteration (ms): 126294.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.727077E+00 | loss scale: 32768.0 | grad norm: 32007.245 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.74 | iteration 16170/ 292968 | consumed samples: 33116160 | consumed tokens: 16743874560 | elapsed time per iteration (ms): 126010.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.719868E+00 | loss scale: 32768.0 | grad norm: 34861.757 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.96 | iteration 16171/ 292968 | consumed samples: 33118208 | consumed tokens: 16745824256 | elapsed time per iteration (ms): 127419.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.718405E+00 | loss scale: 32768.0 | grad norm: 30410.855 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.86 | iteration 16172/ 292968 | consumed samples: 33120256 | consumed tokens: 16747773952 | elapsed time per iteration (ms): 126997.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.704407E+00 | loss scale: 32768.0 | grad norm: 21312.179 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.19 | iteration 16173/ 292968 | consumed samples: 33122304 | consumed tokens: 16749723648 | elapsed time per iteration (ms): 125414.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.676085E+00 | loss scale: 32768.0 | grad norm: 18006.521 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.44 | iteration 16174/ 292968 | consumed samples: 33124352 | consumed tokens: 16751673344 | elapsed time per iteration (ms): 125363.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.717337E+00 | loss scale: 32768.0 | grad norm: 25921.996 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.48 | iteration 16175/ 292968 | consumed samples: 33126400 | consumed tokens: 16753623040 | elapsed time per iteration (ms): 125484.1 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.709707E+00 | loss scale: 32768.0 | grad norm: 33143.273 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.38 | iteration 16176/ 292968 | consumed samples: 33128448 | consumed tokens: 16755572736 | elapsed time per iteration (ms): 125038.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.707956E+00 | loss scale: 32768.0 | grad norm: 26537.907 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.74 | iteration 16177/ 292968 | consumed samples: 33130496 | consumed tokens: 16757522432 | elapsed time per iteration (ms): 125032.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.708268E+00 | loss scale: 32768.0 | grad norm: 19338.347 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.74 | iteration 16178/ 292968 | consumed samples: 33132544 | consumed tokens: 16759472128 | elapsed time per iteration (ms): 124464.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.701108E+00 | loss scale: 32768.0 | grad norm: 22147.224 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.20 | iteration 16179/ 292968 | consumed samples: 33134592 | consumed tokens: 16761421824 | elapsed time per iteration (ms): 125775.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.693890E+00 | loss scale: 32768.0 | grad norm: 27916.500 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.15 | iteration 16180/ 292968 | consumed samples: 33136640 | consumed tokens: 16763371520 | elapsed time per iteration (ms): 124133.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.709651E+00 | loss scale: 32768.0 | grad norm: 30165.849 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.47 | iteration 16181/ 292968 | consumed samples: 33138688 | consumed tokens: 16765321216 | elapsed time per iteration (ms): 126202.7 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.689394E+00 | loss scale: 32768.0 | grad norm: 31985.558 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.81 | iteration 16182/ 292968 | consumed samples: 33140736 | consumed tokens: 16767270912 | elapsed time per iteration (ms): 124280.7 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.694080E+00 | loss scale: 32768.0 | grad norm: 30076.753 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.35 | iteration 16183/ 292968 | consumed samples: 33142784 | consumed tokens: 16769220608 | elapsed time per iteration (ms): 124020.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.704150E+00 | loss scale: 32768.0 | grad norm: 28997.892 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.57 | iteration 16184/ 292968 | consumed samples: 33144832 | consumed tokens: 16771170304 | elapsed time per iteration (ms): 124476.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.701893E+00 | loss scale: 32768.0 | grad norm: 21517.794 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.20 | iteration 16185/ 292968 | consumed samples: 33146880 | consumed tokens: 16773120000 | elapsed time per iteration (ms): 125503.8 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.693627E+00 | loss scale: 32768.0 | grad norm: 24630.504 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.37 | iteration 16186/ 292968 | consumed samples: 33148928 | consumed tokens: 16775069696 | elapsed time per iteration (ms): 123469.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.682398E+00 | loss scale: 32768.0 | grad norm: 31137.350 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 102.02 | iteration 16187/ 292968 | consumed samples: 33150976 | consumed tokens: 16777019392 | elapsed time per iteration (ms): 124407.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.668045E+00 | loss scale: 32768.0 | grad norm: 26196.437 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.25 | iteration 16188/ 292968 | consumed samples: 33153024 | consumed tokens: 16778969088 | elapsed time per iteration (ms): 125178.1 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.697686E+00 | loss scale: 32768.0 | grad norm: 16292.104 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.63 | iteration 16189/ 292968 | consumed samples: 33155072 | consumed tokens: 16780918784 | elapsed time per iteration (ms): 124776.6 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.693599E+00 | loss scale: 32768.0 | grad norm: 26692.094 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.95 | iteration 16190/ 292968 | consumed samples: 33157120 | consumed tokens: 16782868480 | elapsed time per iteration (ms): 124228.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.692213E+00 | loss scale: 32768.0 | grad norm: 34018.053 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.40 | iteration 16191/ 292968 | consumed samples: 33159168 | consumed tokens: 16784818176 | elapsed time per iteration (ms): 124805.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.678258E+00 | loss scale: 32768.0 | grad norm: 27250.414 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.93 | iteration 16192/ 292968 | consumed samples: 33161216 | consumed tokens: 16786767872 | elapsed time per iteration (ms): 127629.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.692630E+00 | loss scale: 32768.0 | grad norm: 22489.174 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.69 | iteration 16193/ 292968 | consumed samples: 33163264 | consumed tokens: 16788717568 | elapsed time per iteration (ms): 126671.9 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.701131E+00 | loss scale: 32768.0 | grad norm: 38602.052 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.44 | iteration 16194/ 292968 | consumed samples: 33165312 | consumed tokens: 16790667264 | elapsed time per iteration (ms): 124471.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.698993E+00 | loss scale: 32768.0 | grad norm: 39144.829 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.20 | iteration 16195/ 292968 | consumed samples: 33167360 | consumed tokens: 16792616960 | elapsed time per iteration (ms): 124462.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.726367E+00 | loss scale: 32768.0 | grad norm: 22746.517 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.21 | iteration 16196/ 292968 | consumed samples: 33169408 | consumed tokens: 16794566656 | elapsed time per iteration (ms): 125970.5 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.705364E+00 | loss scale: 32768.0 | grad norm: 30873.143 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.99 | iteration 16197/ 292968 | consumed samples: 33171456 | consumed tokens: 16796516352 | elapsed time per iteration (ms): 127649.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.711872E+00 | loss scale: 32768.0 | grad norm: 32885.814 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.68 | iteration 16198/ 292968 | consumed samples: 33173504 | consumed tokens: 16798466048 | elapsed time per iteration (ms): 126418.1 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.681718E+00 | loss scale: 32768.0 | grad norm: 27837.583 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.64 | iteration 16199/ 292968 | consumed samples: 33175552 | consumed tokens: 16800415744 | elapsed time per iteration (ms): 123965.2 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.726325E+00 | loss scale: 32768.0 | grad norm: 37486.861 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.61 | iteration 16200/ 292968 | consumed samples: 33177600 | consumed tokens: 16802365440 | elapsed time per iteration (ms): 123998.0 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.694619E+00 | loss scale: 32768.0 | grad norm: 27411.391 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.59 | ------------------------------------------------------------------------------------------- valid loss at iteration 16200 | lm loss value: 2.847370E+00 | lm loss PPL: 1.724238E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 16200 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 04:37:11,066] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/mp_rank_01_model_states.pt [2022-02-05 04:37:11,126] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/mp_rank_00_model_states.pt [2022-02-05 04:48:51,976] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 04:48:52,287] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 04:48:52,323] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 04:48:52,390] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 04:48:52,490] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 04:48:52,506] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 04:48:52,886] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 04:48:53,234] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 04:48:55,561] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 04:48:55,962] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 04:48:55,965] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 04:48:55,974] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 04:48:56,011] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 04:48:56,069] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 04:48:56,288] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 04:48:56,334] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 04:48:56,391] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 04:48:56,612] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 04:48:56,637] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 04:48:57,022] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 04:48:57,321] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 04:48:57,536] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 04:48:57,694] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 04:48:58,091] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 04:48:58,156] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 04:48:58,226] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 04:48:58,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 04:48:58,436] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 04:48:58,501] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 04:48:58,552] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 04:48:58,678] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 04:48:58,720] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 04:48:58,871] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 04:48:58,896] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 04:48:59,089] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 04:48:59,300] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 04:48:59,334] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 04:48:59,671] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 04:48:59,860] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 04:48:59,916] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 04:49:00,052] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 04:49:00,449] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 04:49:00,543] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 04:49:00,572] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 04:49:00,588] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 04:49:00,609] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 04:49:00,669] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 04:49:00,728] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 04:49:00,829] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 04:49:01,241] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 04:49:01,644] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 04:49:01,650] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 04:49:02,019] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 04:49:02,182] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 04:49:02,725] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 04:49:02,799] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 04:49:02,977] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 04:49:03,114] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 04:49:03,509] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 04:49:03,698] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 04:49:04,033] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 04:49:04,770] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 04:49:04,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 04:49:05,005] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 04:49:05,009] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 04:49:05,084] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 04:49:05,454] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 04:49:05,516] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 04:49:05,797] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 04:49:06,180] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 04:49:06,547] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 04:49:06,550] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 04:49:06,867] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 04:49:07,297] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 04:49:07,342] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 04:49:07,397] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 04:49:07,437] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 04:49:07,456] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 04:49:07,775] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 04:49:07,776] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 04:49:07,960] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 04:49:08,001] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 04:49:08,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 04:49:08,763] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 04:49:08,863] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 04:49:09,212] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 04:49:09,248] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 04:49:09,258] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 04:49:09,789] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 04:49:09,977] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 04:49:09,991] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 04:49:11,162] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 04:49:11,439] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 04:49:11,496] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 04:49:11,572] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 04:49:11,725] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 04:49:11,752] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 04:49:11,790] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 04:49:12,782] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 04:49:12,825] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 04:49:13,818] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 04:49:13,938] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 04:49:15,054] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 04:49:15,638] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 04:49:55,800] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 04:50:01,331] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 04:50:06,773] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 04:50:14,199] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 04:50:27,197] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 04:50:27,272] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 04:50:27,736] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 04:50:27,837] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 04:50:27,893] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 04:50:28,231] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 04:50:29,950] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 04:50:29,988] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 04:50:29,996] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 04:50:30,018] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 04:50:31,691] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 04:50:35,899] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 04:50:36,002] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 04:51:30,732] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 04:51:30,782] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 04:51:30,799] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 04:51:31,359] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 04:51:31,522] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 04:51:32,864] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 04:51:32,879] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16200/zero_pp_rank_0_mp_rank_119_optim_states.pt successfully saved checkpoint at iteration 16200 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 972044.37 iteration 16201/ 292968 | consumed samples: 33179648 | consumed tokens: 16804315136 | elapsed time per iteration (ms): 1493062.7 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.712639E+00 | loss scale: 32768.0 | grad norm: 18945.870 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.001 | TFLOPs: 8.44 | iteration 16202/ 292968 | consumed samples: 33181696 | consumed tokens: 16806264832 | elapsed time per iteration (ms): 134049.1 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.678382E+00 | loss scale: 32768.0 | grad norm: 24562.603 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 93.97 | iteration 16203/ 292968 | consumed samples: 33183744 | consumed tokens: 16808214528 | elapsed time per iteration (ms): 133189.3 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.685820E+00 | loss scale: 32768.0 | grad norm: 36251.012 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.58 | iteration 16204/ 292968 | consumed samples: 33185792 | consumed tokens: 16810164224 | elapsed time per iteration (ms): 130091.4 | learning rate: 5.945E-05 | global batch size: 2048 | lm loss: 2.682349E+00 | loss scale: 32768.0 | grad norm: 29713.193 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.83 | iteration 16205/ 292968 | consumed samples: 33187840 | consumed tokens: 16812113920 | elapsed time per iteration (ms): 130033.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.677633E+00 | loss scale: 32768.0 | grad norm: 28340.336 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.87 | iteration 16206/ 292968 | consumed samples: 33189888 | consumed tokens: 16814063616 | elapsed time per iteration (ms): 129248.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.704214E+00 | loss scale: 32768.0 | grad norm: 28414.740 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.46 | iteration 16207/ 292968 | consumed samples: 33191936 | consumed tokens: 16816013312 | elapsed time per iteration (ms): 128134.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.702789E+00 | loss scale: 32768.0 | grad norm: 29409.156 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.31 | iteration 16208/ 292968 | consumed samples: 33193984 | consumed tokens: 16817963008 | elapsed time per iteration (ms): 128033.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.710036E+00 | loss scale: 32768.0 | grad norm: 28073.932 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.38 | iteration 16209/ 292968 | consumed samples: 33196032 | consumed tokens: 16819912704 | elapsed time per iteration (ms): 127887.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.681766E+00 | loss scale: 32768.0 | grad norm: 30798.698 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.50 | iteration 16210/ 292968 | consumed samples: 33198080 | consumed tokens: 16821862400 | elapsed time per iteration (ms): 128390.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.702747E+00 | loss scale: 32768.0 | grad norm: 31580.175 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.11 | iteration 16211/ 292968 | consumed samples: 33200128 | consumed tokens: 16823812096 | elapsed time per iteration (ms): 127335.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.721744E+00 | loss scale: 32768.0 | grad norm: 30064.390 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.92 | iteration 16212/ 292968 | consumed samples: 33202176 | consumed tokens: 16825761792 | elapsed time per iteration (ms): 127778.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.702151E+00 | loss scale: 32768.0 | grad norm: 23668.341 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.58 | iteration 16213/ 292968 | consumed samples: 33204224 | consumed tokens: 16827711488 | elapsed time per iteration (ms): 128676.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.697730E+00 | loss scale: 32768.0 | grad norm: 29451.642 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.89 | iteration 16214/ 292968 | consumed samples: 33206272 | consumed tokens: 16829661184 | elapsed time per iteration (ms): 127172.6 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.683867E+00 | loss scale: 32768.0 | grad norm: 31617.648 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.05 | iteration 16215/ 292968 | consumed samples: 33208320 | consumed tokens: 16831610880 | elapsed time per iteration (ms): 127168.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.701441E+00 | loss scale: 32768.0 | grad norm: 32742.263 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.05 | iteration 16216/ 292968 | consumed samples: 33210368 | consumed tokens: 16833560576 | elapsed time per iteration (ms): 126471.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.715740E+00 | loss scale: 32768.0 | grad norm: 34870.099 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.60 | iteration 16217/ 292968 | consumed samples: 33212416 | consumed tokens: 16835510272 | elapsed time per iteration (ms): 126929.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.685089E+00 | loss scale: 32768.0 | grad norm: 24229.706 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.24 | iteration 16218/ 292968 | consumed samples: 33214464 | consumed tokens: 16837459968 | elapsed time per iteration (ms): 127644.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.679208E+00 | loss scale: 32768.0 | grad norm: 24995.152 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.68 | iteration 16219/ 292968 | consumed samples: 33216512 | consumed tokens: 16839409664 | elapsed time per iteration (ms): 128268.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.676067E+00 | loss scale: 32768.0 | grad norm: 27086.244 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.20 | iteration 16220/ 292968 | consumed samples: 33218560 | consumed tokens: 16841359360 | elapsed time per iteration (ms): 128210.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.684192E+00 | loss scale: 32768.0 | grad norm: 28036.621 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.25 | iteration 16221/ 292968 | consumed samples: 33220608 | consumed tokens: 16843309056 | elapsed time per iteration (ms): 128504.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.685871E+00 | loss scale: 32768.0 | grad norm: 26300.752 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.02 | iteration 16222/ 292968 | consumed samples: 33222656 | consumed tokens: 16845258752 | elapsed time per iteration (ms): 128673.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.664461E+00 | loss scale: 32768.0 | grad norm: 21900.640 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.89 | iteration 16223/ 292968 | consumed samples: 33224704 | consumed tokens: 16847208448 | elapsed time per iteration (ms): 127839.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.701643E+00 | loss scale: 32768.0 | grad norm: 22605.489 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.53 | iteration 16224/ 292968 | consumed samples: 33226752 | consumed tokens: 16849158144 | elapsed time per iteration (ms): 128078.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.714240E+00 | loss scale: 32768.0 | grad norm: 26555.927 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.35 | iteration 16225/ 292968 | consumed samples: 33228800 | consumed tokens: 16851107840 | elapsed time per iteration (ms): 127419.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.691277E+00 | loss scale: 32768.0 | grad norm: 27703.610 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.86 | iteration 16226/ 292968 | consumed samples: 33230848 | consumed tokens: 16853057536 | elapsed time per iteration (ms): 127035.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.695894E+00 | loss scale: 32768.0 | grad norm: 32882.867 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.16 | iteration 16227/ 292968 | consumed samples: 33232896 | consumed tokens: 16855007232 | elapsed time per iteration (ms): 125988.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.679167E+00 | loss scale: 32768.0 | grad norm: 34506.891 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.98 | iteration 16228/ 292968 | consumed samples: 33234944 | consumed tokens: 16856956928 | elapsed time per iteration (ms): 126597.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.684859E+00 | loss scale: 32768.0 | grad norm: 32916.456 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.50 | iteration 16229/ 292968 | consumed samples: 33236992 | consumed tokens: 16858906624 | elapsed time per iteration (ms): 125893.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.705697E+00 | loss scale: 32768.0 | grad norm: 28199.918 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.06 | iteration 16230/ 292968 | consumed samples: 33239040 | consumed tokens: 16860856320 | elapsed time per iteration (ms): 125258.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.698322E+00 | loss scale: 32768.0 | grad norm: 24429.710 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.56 | iteration 16231/ 292968 | consumed samples: 33241088 | consumed tokens: 16862806016 | elapsed time per iteration (ms): 126184.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.683692E+00 | loss scale: 32768.0 | grad norm: 20354.859 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.83 | iteration 16232/ 292968 | consumed samples: 33243136 | consumed tokens: 16864755712 | elapsed time per iteration (ms): 124391.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.685346E+00 | loss scale: 32768.0 | grad norm: 23138.941 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 101.26 | iteration 16233/ 292968 | consumed samples: 33245184 | consumed tokens: 16866705408 | elapsed time per iteration (ms): 123624.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.665440E+00 | loss scale: 32768.0 | grad norm: 26209.383 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.89 | iteration 16234/ 292968 | consumed samples: 33247232 | consumed tokens: 16868655104 | elapsed time per iteration (ms): 123684.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.695575E+00 | loss scale: 32768.0 | grad norm: 29317.217 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.84 | iteration 16235/ 292968 | consumed samples: 33249280 | consumed tokens: 16870604800 | elapsed time per iteration (ms): 125693.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.674866E+00 | loss scale: 32768.0 | grad norm: 29084.803 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.21 | iteration 16236/ 292968 | consumed samples: 33251328 | consumed tokens: 16872554496 | elapsed time per iteration (ms): 126056.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.696431E+00 | loss scale: 32768.0 | grad norm: 29816.156 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.93 | iteration 16237/ 292968 | consumed samples: 33253376 | consumed tokens: 16874504192 | elapsed time per iteration (ms): 126659.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.702888E+00 | loss scale: 32768.0 | grad norm: 26752.970 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.45 | iteration 16238/ 292968 | consumed samples: 33255424 | consumed tokens: 16876453888 | elapsed time per iteration (ms): 127137.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.689605E+00 | loss scale: 32768.0 | grad norm: 23157.562 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.08 | iteration 16239/ 292968 | consumed samples: 33257472 | consumed tokens: 16878403584 | elapsed time per iteration (ms): 127631.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.666304E+00 | loss scale: 32768.0 | grad norm: 26192.678 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.69 | iteration 16240/ 292968 | consumed samples: 33259520 | consumed tokens: 16880353280 | elapsed time per iteration (ms): 125522.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.699692E+00 | loss scale: 32768.0 | grad norm: 32486.226 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.35 | iteration 16241/ 292968 | consumed samples: 33261568 | consumed tokens: 16882302976 | elapsed time per iteration (ms): 126598.6 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.711889E+00 | loss scale: 32768.0 | grad norm: 40818.696 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.50 | iteration 16242/ 292968 | consumed samples: 33263616 | consumed tokens: 16884252672 | elapsed time per iteration (ms): 126517.0 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.688369E+00 | loss scale: 32768.0 | grad norm: 30572.791 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.56 | iteration 16243/ 292968 | consumed samples: 33265664 | consumed tokens: 16886202368 | elapsed time per iteration (ms): 126467.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.711939E+00 | loss scale: 32768.0 | grad norm: 31055.611 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.60 | iteration 16244/ 292968 | consumed samples: 33267712 | consumed tokens: 16888152064 | elapsed time per iteration (ms): 127085.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.709806E+00 | loss scale: 32768.0 | grad norm: 41909.010 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.12 | iteration 16245/ 292968 | consumed samples: 33269760 | consumed tokens: 16890101760 | elapsed time per iteration (ms): 128332.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.719176E+00 | loss scale: 32768.0 | grad norm: 35305.902 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.15 | iteration 16246/ 292968 | consumed samples: 33271808 | consumed tokens: 16892051456 | elapsed time per iteration (ms): 125556.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.675266E+00 | loss scale: 32768.0 | grad norm: 30871.632 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.32 | iteration 16247/ 292968 | consumed samples: 33273856 | consumed tokens: 16894001152 | elapsed time per iteration (ms): 124042.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.705204E+00 | loss scale: 32768.0 | grad norm: 34738.256 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.017 | TFLOPs: 101.55 | iteration 16248/ 292968 | consumed samples: 33275904 | consumed tokens: 16895950848 | elapsed time per iteration (ms): 125288.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.717340E+00 | loss scale: 32768.0 | grad norm: 21031.035 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.54 | iteration 16249/ 292968 | consumed samples: 33277952 | consumed tokens: 16897900544 | elapsed time per iteration (ms): 125127.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.687052E+00 | loss scale: 32768.0 | grad norm: 18349.590 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.67 | iteration 16250/ 292968 | consumed samples: 33280000 | consumed tokens: 16899850240 | elapsed time per iteration (ms): 126857.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.685069E+00 | loss scale: 32768.0 | grad norm: 31864.024 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.30 | saving checkpoint at iteration 16250 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 06:38:23,427] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/mp_rank_00_model_states.pt [2022-02-05 06:38:23,649] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/mp_rank_01_model_states.pt [2022-02-05 06:43:25,046] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 06:43:25,473] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 06:43:25,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 06:43:26,483] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 06:43:26,485] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 06:43:26,530] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 06:43:26,624] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 06:43:27,003] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 06:43:27,086] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 06:43:27,592] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 06:43:27,602] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 06:43:27,729] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 06:43:27,897] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 06:43:27,942] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 06:43:28,015] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 06:43:28,060] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 06:43:28,206] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 06:43:28,257] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 06:43:28,334] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 06:43:28,454] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 06:43:28,561] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 06:43:28,667] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 06:43:28,740] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 06:43:28,803] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 06:43:28,973] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 06:43:29,124] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 06:43:29,131] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 06:43:29,221] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 06:43:29,185] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 06:43:29,240] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 06:43:29,231] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 06:43:29,359] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 06:43:29,363] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 06:43:29,376] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 06:43:29,481] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 06:43:29,590] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 06:43:29,904] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 06:43:30,033] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 06:43:30,057] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 06:43:30,264] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 06:43:30,282] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 06:43:30,385] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 06:43:30,455] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 06:43:30,610] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 06:43:30,472] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 06:43:30,817] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 06:43:31,270] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 06:43:31,775] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 06:43:32,180] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 06:43:32,190] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 06:43:32,223] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 06:43:32,284] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 06:43:32,325] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 06:43:33,233] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 06:43:33,597] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 06:43:33,656] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 06:43:33,676] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 06:43:34,049] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 06:43:34,621] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 06:43:34,862] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 06:43:34,886] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 06:43:34,906] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 06:43:34,901] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 06:43:35,273] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 06:43:35,458] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 06:43:35,514] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 06:43:35,672] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 06:43:35,669] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 06:43:35,791] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 06:43:35,707] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 06:43:35,714] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 06:43:35,810] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 06:43:35,837] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 06:43:36,101] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 06:43:36,139] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 06:43:36,250] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 06:43:36,308] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 06:43:36,684] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 06:43:37,083] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 06:43:37,344] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 06:43:37,539] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 06:43:38,238] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 06:43:38,794] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 06:43:39,124] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 06:43:39,174] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 06:43:39,444] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 06:43:40,333] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 06:43:41,725] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 06:43:41,883] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 06:43:42,021] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 06:43:42,088] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 06:43:42,395] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 06:43:47,708] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 06:43:47,719] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 06:43:51,354] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 06:43:51,385] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 06:44:17,745] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 06:44:42,928] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 06:44:44,530] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 06:44:49,912] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 06:46:20,777] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 06:46:45,409] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 06:46:47,027] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 06:46:57,745] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 06:48:38,665] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 06:48:52,706] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 06:48:57,415] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 06:49:18,441] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 06:50:58,034] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 06:51:03,029] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 06:51:10,641] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 06:51:32,085] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 06:51:39,016] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 06:52:19,711] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 06:53:04,892] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 06:53:34,451] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 06:53:45,817] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 06:54:22,943] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 06:54:55,001] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 06:54:55,247] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 06:55:05,605] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 06:55:45,827] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 06:56:02,624] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 06:56:03,625] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 06:56:08,177] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 06:57:33,568] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 06:57:50,026] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 06:59:35,475] [INFO] [engine.py:3007:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16250/zero_pp_rank_0_mp_rank_118_optim_states.pt successfully saved checkpoint at iteration 16250 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 1306717.06 iteration 16251/ 292968 | consumed samples: 33282048 | consumed tokens: 16901799936 | elapsed time per iteration (ms): 1443307.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.688425E+00 | loss scale: 32768.0 | grad norm: 34743.409 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.001 | TFLOPs: 8.73 | iteration 16252/ 292968 | consumed samples: 33284096 | consumed tokens: 16903749632 | elapsed time per iteration (ms): 133504.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.694449E+00 | loss scale: 32768.0 | grad norm: 29277.026 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 94.35 | iteration 16253/ 292968 | consumed samples: 33286144 | consumed tokens: 16905699328 | elapsed time per iteration (ms): 130123.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.676714E+00 | loss scale: 32768.0 | grad norm: 23663.268 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.80 | iteration 16254/ 292968 | consumed samples: 33288192 | consumed tokens: 16907649024 | elapsed time per iteration (ms): 129452.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.690243E+00 | loss scale: 32768.0 | grad norm: 16444.351 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.31 | iteration 16255/ 292968 | consumed samples: 33290240 | consumed tokens: 16909598720 | elapsed time per iteration (ms): 127975.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.684718E+00 | loss scale: 32768.0 | grad norm: 21941.427 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.43 | iteration 16256/ 292968 | consumed samples: 33292288 | consumed tokens: 16911548416 | elapsed time per iteration (ms): 127035.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.705424E+00 | loss scale: 32768.0 | grad norm: 28101.418 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.16 | iteration 16257/ 292968 | consumed samples: 33294336 | consumed tokens: 16913498112 | elapsed time per iteration (ms): 126796.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.680789E+00 | loss scale: 32768.0 | grad norm: 33965.476 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.34 | iteration 16258/ 292968 | consumed samples: 33296384 | consumed tokens: 16915447808 | elapsed time per iteration (ms): 128098.6 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.687312E+00 | loss scale: 32768.0 | grad norm: 34252.004 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.33 | iteration 16259/ 292968 | consumed samples: 33298432 | consumed tokens: 16917413888 | elapsed time per iteration (ms): 128518.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.699821E+00 | loss scale: 32768.0 | grad norm: 31533.413 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.84 | iteration 16260/ 292968 | consumed samples: 33300480 | consumed tokens: 16919379968 | elapsed time per iteration (ms): 126513.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.677754E+00 | loss scale: 32768.0 | grad norm: 29889.496 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 100.40 | iteration 16261/ 292968 | consumed samples: 33302528 | consumed tokens: 16921346048 | elapsed time per iteration (ms): 127250.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.665891E+00 | loss scale: 32768.0 | grad norm: 23558.137 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.82 | iteration 16262/ 292968 | consumed samples: 33304576 | consumed tokens: 16923312128 | elapsed time per iteration (ms): 127916.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.679013E+00 | loss scale: 32768.0 | grad norm: 21268.797 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.30 | iteration 16263/ 292968 | consumed samples: 33306624 | consumed tokens: 16925278208 | elapsed time per iteration (ms): 128953.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.715065E+00 | loss scale: 32768.0 | grad norm: 27357.783 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.50 | iteration 16264/ 292968 | consumed samples: 33308672 | consumed tokens: 16927244288 | elapsed time per iteration (ms): 129362.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.685831E+00 | loss scale: 32768.0 | grad norm: 39075.383 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.19 | iteration 16265/ 292968 | consumed samples: 33310720 | consumed tokens: 16929210368 | elapsed time per iteration (ms): 127688.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.700187E+00 | loss scale: 32768.0 | grad norm: 32229.198 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.48 | iteration 16266/ 292968 | consumed samples: 33312768 | consumed tokens: 16931176448 | elapsed time per iteration (ms): 132250.0 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.697684E+00 | loss scale: 32768.0 | grad norm: 36599.320 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 96.05 | iteration 16267/ 292968 | consumed samples: 33314816 | consumed tokens: 16933142528 | elapsed time per iteration (ms): 130160.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.720734E+00 | loss scale: 32768.0 | grad norm: 26458.685 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.59 | iteration 16268/ 292968 | consumed samples: 33316864 | consumed tokens: 16935108608 | elapsed time per iteration (ms): 131592.0 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.695192E+00 | loss scale: 32768.0 | grad norm: 26421.265 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 96.53 | iteration 16269/ 292968 | consumed samples: 33318912 | consumed tokens: 16937074688 | elapsed time per iteration (ms): 128978.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.685990E+00 | loss scale: 32768.0 | grad norm: 29171.427 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.48 | iteration 16270/ 292968 | consumed samples: 33320960 | consumed tokens: 16939040768 | elapsed time per iteration (ms): 128284.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.699887E+00 | loss scale: 32768.0 | grad norm: 40307.297 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.02 | iteration 16271/ 292968 | consumed samples: 33323008 | consumed tokens: 16941006848 | elapsed time per iteration (ms): 127533.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.703909E+00 | loss scale: 32768.0 | grad norm: 23971.198 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 99.60 | iteration 16272/ 292968 | consumed samples: 33325056 | consumed tokens: 16942972928 | elapsed time per iteration (ms): 130338.0 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.695987E+00 | loss scale: 32768.0 | grad norm: 30330.922 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.46 | iteration 16273/ 292968 | consumed samples: 33327104 | consumed tokens: 16944939008 | elapsed time per iteration (ms): 129649.7 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.721024E+00 | loss scale: 32768.0 | grad norm: 39043.531 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 97.97 | iteration 16274/ 292968 | consumed samples: 33329152 | consumed tokens: 16946905088 | elapsed time per iteration (ms): 132792.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.678114E+00 | loss scale: 32768.0 | grad norm: 24161.252 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 95.65 | iteration 16275/ 292968 | consumed samples: 33331200 | consumed tokens: 16948871168 | elapsed time per iteration (ms): 132996.6 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.650754E+00 | loss scale: 32768.0 | grad norm: 29004.768 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 95.51 | iteration 16276/ 292968 | consumed samples: 33333248 | consumed tokens: 16950837248 | elapsed time per iteration (ms): 132807.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.713070E+00 | loss scale: 32768.0 | grad norm: 40143.433 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.015 | TFLOPs: 95.64 | iteration 16277/ 292968 | consumed samples: 33335296 | consumed tokens: 16952803328 | elapsed time per iteration (ms): 129030.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.682552E+00 | loss scale: 32768.0 | grad norm: 16453.044 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.016 | TFLOPs: 98.44 | srun: Job step aborted: Waiting up to 62 seconds for job step to finish. slurmstepd: error: *** STEP 1737098.0 ON jean-zay-iam01 CANCELLED AT 2022-02-05T07:58:50 *** Killing subprocess 328553 Killing subprocess 328554 Killing subprocess 328555 Killing subprocess 328842 Killing subprocess 328843 Killing subprocess 328844 Killing subprocess 328556 Killing subprocess 328558 Killing subprocess 328560 Killing subprocess 328562 Killing subprocess 328564 Main process received SIGTERM, exiting Killing subprocess 328845 Killing subprocess 327348 Killing subprocess 328847 Killing subprocess 326777 Killing subprocess 371951 Killing subprocess 331497 Killing subprocess 326778 Killing subprocess 371952 Killing subprocess 327349 Killing subprocess 331498 Killing subprocess 326779 Killing subprocess 371953 Killing subprocess 331499 Killing subprocess 327350 Killing subprocess 326780 Killing subprocess 326782 Killing subprocess 331662 Killing subprocess 326784 Killing subprocess 326787 Killing subprocess 331663 Killing subprocess 326789 Killing subprocess 331664 Main process received SIGTERM, exiting Killing subprocess 371954 Killing subprocess 371956 Killing subprocess 371958 Killing subprocess 351105 Killing subprocess 371960 Killing subprocess 331500 Killing subprocess 331502 Killing subprocess 331505 Killing subprocess 331507 Killing subprocess 331509 Main process received SIGTERM, exiting Killing subprocess 328849 Killing subprocess 328852 Killing subprocess 328854 Killing subprocess 327351 Main process received SIGTERM, exiting Killing subprocess 327353 Killing subprocess 327356 Killing subprocess 327358 Killing subprocess 327360 Killing subprocess 351109 Killing subprocess 331621 Killing subprocess 351110 Killing subprocess 326518 Killing subprocess 331622 Killing subprocess 326519 Killing subprocess 328070 Killing subprocess 331623 Killing subprocess 330123 Killing subprocess 326520 Killing subprocess 351112 Killing subprocess 351114 Killing subprocess 351116 Killing subprocess 331665 Killing subprocess 328505 Killing subprocess 351118 Killing subprocess 331668 Killing subprocess 328071 Killing subprocess 331670 Killing subprocess 331672 Killing subprocess 331674 Main process received SIGTERM, exiting Killing subprocess 330124 Killing subprocess 328506 Killing subprocess 328072 Killing subprocess 330125 Killing subprocess 328507 Killing subprocess 331624 Killing subprocess 331628 Killing subprocess 331631 Killing subprocess 331633 Killing subprocess 326565 Killing subprocess 326554 Killing subprocess 326566 Killing subprocess 326555 Killing subprocess 326556 Killing subprocess 326567 Killing subprocess 330126 Killing subprocess 330129 Killing subprocess 330131 Killing subprocess 330133 Killing subprocess 326521 Killing subprocess 326557 Killing subprocess 326524 Killing subprocess 326559 Killing subprocess 326525 Killing subprocess 326561 Killing subprocess 326528 Main process received SIGTERM, exiting Killing subprocess 326563 Killing subprocess 326531 Main process received SIGTERM, exiting Killing subprocess 326568 Killing subprocess 326570 Killing subprocess 328074 Killing subprocess 326573 Killing subprocess 328075 Killing subprocess 326575 Killing subprocess 328077 Killing subprocess 326577 Killing subprocess 328081 Killing subprocess 328083 Main process received SIGTERM, exiting Killing subprocess 328508 Killing subprocess 328510 Killing subprocess 328512 Killing subprocess 328515 Killing subprocess 328517 Main process received SIGTERM, exiting Killing subprocess 371963 Main process received SIGTERM, exiting Killing subprocess 331635 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 330135 Main process received SIGTERM, exiting Killing subprocess 351120 Main process received SIGTERM, exiting Killing subprocess 326565 Main process received SIGTERM, exiting Killing subprocess 1609232 Killing subprocess 1609233 Killing subprocess 1609234 Killing subprocess 1609236 Killing subprocess 1609238 Killing subprocess 1609241 Killing subprocess 1609243 Killing subprocess 1609245 Main process received SIGTERM, exiting /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1747321.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.10.2 torch cuda version ............... 11.3 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+ba9c4cc7, ba9c4cc7, master deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3 > setting tensorboard ... **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-05 07:59:39,337] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.135 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 9.478 seconds time to initialize megatron (seconds): -27.048 [after megatron is initialized] datetime: 2022-02-05 07:59:48 building GPT model ... [2022-02-05 07:59:49,011] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-05 07:59:49,012] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-05 07:59:49,012] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 47.96 GB, percent = 9.5% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-05 07:59:50,728] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-05 07:59:51,331] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-05 07:59:51,332] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-05 07:59:51,332] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.3 GB, percent = 9.6% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-05 07:59:51,453] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+ba9c4cc7, git-hash=ba9c4cc7, git-branch=master Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] [2022-02-05 07:59:52,841] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-05 07:59:52,841] [INFO] [engine.py:1099:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-05 07:59:52,841] [INFO] [engine.py:1105:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-05 07:59:52,841] [INFO] [engine.py:1121:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-05 07:59:52,841] [INFO] [utils.py:48:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-05 07:59:52,841] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-05 07:59:52,841] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-05 07:59:52,841] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-05 07:59:52,841] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-05 07:59:52,841] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] [2022-02-05 07:59:57,421] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-05 07:59:57,421] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-05 07:59:57,421] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.43 GB, percent = 9.6% [2022-02-05 07:59:57,497] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-05 07:59:57,498] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-05 07:59:57,498] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.43 GB, percent = 9.6% [2022-02-05 07:59:57,498] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-05 07:59:57,520] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-05 07:59:57,520] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-05 07:59:57,520] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.43 GB, percent = 9.6% [2022-02-05 07:59:57,520] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-05 07:59:57,520] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-05 07:59:57,520] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-05 07:59:57,520] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-05 07:59:57,521] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-05 07:59:57,521] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_16bit_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-05 07:59:57,522] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-05 07:59:57,522] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-05 07:59:57,522] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 07:59:59,882] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-05 08:00:23,507] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-05 08:00:25,244] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-05 08:00:25,359] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-05 08:00:26,121] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-05 08:00:26,392] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-05 08:00:26,607] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-05 08:00:26,772] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-05 08:00:26,821] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-05 08:00:26,886] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-05 08:00:27,354] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-05 08:00:27,428] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-05 08:00:27,763] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-05 08:00:27,900] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-05 08:00:28,018] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-05 08:00:28,070] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-05 08:00:28,328] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-05 08:00:28,389] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-05 08:00:28,409] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-05 08:00:28,671] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-05 08:00:28,838] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-05 08:00:28,928] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-05 08:00:28,987] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-05 08:00:29,028] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-05 08:00:29,152] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-05 08:00:29,181] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-05 08:00:29,289] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-05 08:00:29,294] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-05 08:00:29,422] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-05 08:00:29,678] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-05 08:00:29,679] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-05 08:00:29,680] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-05 08:00:29,700] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-05 08:00:29,730] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-05 08:00:29,752] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-05 08:00:29,820] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-05 08:00:29,964] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-05 08:00:29,985] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-05 08:00:30,058] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-05 08:00:30,092] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-05 08:00:30,188] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-05 08:00:30,341] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-05 08:00:30,344] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-05 08:00:30,486] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-05 08:00:30,521] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-05 08:00:30,554] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-05 08:00:30,564] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-05 08:00:30,649] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-05 08:00:30,661] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-05 08:00:30,672] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-05 08:00:30,793] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-05 08:00:30,859] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-05 08:00:30,861] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-05 08:00:30,877] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-05 08:00:30,951] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-05 08:00:31,119] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-05 08:00:31,235] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-05 08:00:31,461] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-05 08:00:31,499] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-05 08:00:31,499] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-05 08:00:31,630] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-05 08:00:31,631] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-05 08:00:31,632] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-05 08:00:31,725] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-05 08:00:31,734] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-05 08:00:31,768] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-05 08:00:31,790] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-05 08:00:31,804] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-05 08:00:31,823] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-05 08:00:31,831] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-05 08:00:31,837] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-05 08:00:31,841] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-05 08:00:31,856] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-05 08:00:31,858] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-05 08:00:32,030] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-05 08:00:32,042] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-05 08:00:32,097] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-05 08:00:32,120] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-05 08:00:32,136] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-05 08:00:32,152] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-05 08:00:32,257] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-05 08:00:32,299] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-05 08:00:32,314] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-05 08:00:32,316] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-05 08:00:32,319] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-05 08:00:32,412] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-05 08:00:32,462] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-05 08:00:32,466] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-05 08:00:32,497] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-05 08:00:32,533] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-05 08:00:32,649] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-05 08:00:32,659] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-05 08:00:32,677] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-05 08:00:32,724] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-05 08:00:32,917] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-05 08:00:32,954] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-05 08:00:33,011] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-05 08:00:33,149] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-05 08:00:33,178] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-05 08:00:33,211] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-02-05 08:00:33,247] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-05 08:00:33,251] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-05 08:00:33,283] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-05 08:00:33,318] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-05 08:00:33,320] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-05 08:00:33,322] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-05 08:00:33,380] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-05 08:00:33,437] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-05 08:00:33,468] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-05 08:00:33,532] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-05 08:00:33,552] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-05 08:00:33,579] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-05 08:00:33,587] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-05 08:00:33,592] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-05 08:00:33,594] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-05 08:00:33,603] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-05 08:00:33,889] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-05 08:00:33,916] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-05 08:00:33,942] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-05 08:00:33,980] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-05 08:00:34,016] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-05 08:00:34,124] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-05 08:00:34,125] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-05 08:00:34,158] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-05 08:00:34,168] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-05 08:00:34,176] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-05 08:00:34,179] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-05 08:00:34,246] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-05 08:00:34,340] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-05 08:00:34,419] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-05 08:00:34,432] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 [2022-02-05 08:00:34,442] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 [2022-02-05 08:00:34,459] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 checkpoint version 3.0 [2022-02-05 08:00:34,479] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-05 08:00:34,563] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-05 08:00:34,568] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-05 08:00:34,586] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-05 08:00:34,609] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-05 08:00:34,610] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-05 08:00:34,629] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-05 08:00:34,657] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-05 08:00:34,742] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-05 08:00:34,807] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-05 08:00:34,810] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-05 08:00:34,853] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-05 08:00:34,902] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-05 08:00:34,908] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-05 08:00:34,965] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-05 08:00:34,982] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-05 08:00:35,031] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-05 08:00:35,071] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-05 08:00:35,081] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-05 08:00:35,085] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-05 08:00:35,113] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-05 08:00:35,118] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-05 08:00:35,128] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-05 08:00:35,135] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-05 08:00:35,179] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-05 08:00:35,224] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-05 08:00:35,228] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-05 08:00:35,246] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-05 08:00:35,342] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-05 08:00:35,360] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-05 08:00:35,369] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-05 08:00:35,493] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-05 08:00:35,538] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-05 08:00:35,565] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-02-05 08:00:35,586] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-05 08:00:35,593] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-05 08:00:35,684] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-05 08:00:35,691] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-05 08:00:35,699] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-05 08:00:35,722] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-05 08:00:35,748] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-05 08:00:35,770] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-05 08:00:35,778] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-05 08:00:35,790] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-05 08:00:35,803] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-05 08:00:35,828] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-05 08:00:35,845] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-05 08:00:36,002] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-05 08:00:36,008] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-05 08:00:36,046] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-05 08:00:36,172] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-05 08:00:36,192] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-05 08:00:36,222] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-05 08:00:36,278] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-05 08:00:36,339] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-05 08:00:36,366] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-05 08:00:36,369] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-05 08:00:36,369] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-05 08:00:36,415] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-05 08:00:36,447] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-05 08:00:36,457] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-05 08:00:36,462] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-05 08:00:36,509] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-05 08:00:36,513] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-05 08:00:36,522] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-05 08:00:36,522] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-05 08:00:36,526] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-05 08:00:36,537] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-05 08:00:36,539] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-05 08:00:36,548] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-05 08:00:36,600] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-05 08:00:36,654] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-05 08:00:36,731] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-05 08:00:36,732] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-05 08:00:36,757] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-05 08:00:36,770] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-05 08:00:36,800] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-05 08:00:36,828] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-05 08:00:36,870] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-05 08:00:36,888] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-05 08:00:36,911] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-05 08:00:36,915] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-05 08:00:37,059] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-05 08:00:37,068] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-05 08:00:37,078] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-05 08:00:37,088] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-05 08:00:37,098] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-05 08:00:37,188] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-05 08:00:37,196] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-05 08:00:37,204] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-05 08:00:37,228] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-05 08:00:37,239] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-05 08:00:37,247] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-05 08:00:37,252] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-05 08:00:37,256] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-05 08:00:37,288] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-05 08:00:37,300] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-05 08:00:37,341] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-05 08:00:37,474] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-05 08:00:37,522] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-05 08:00:37,730] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-05 08:00:37,816] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-05 08:00:37,854] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-05 08:00:37,870] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-05 08:00:37,895] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-05 08:00:37,915] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-05 08:00:37,979] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-05 08:00:38,033] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-05 08:00:38,103] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-05 08:00:38,116] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-05 08:00:38,132] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-05 08:00:38,293] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-05 08:00:38,337] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-05 08:00:38,398] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-05 08:00:38,417] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-02-05 08:00:38,433] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-05 08:00:38,451] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-05 08:00:38,455] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-05 08:00:38,666] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-05 08:00:38,692] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-05 08:00:38,723] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-05 08:00:38,861] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-02-05 08:00:38,897] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-05 08:00:38,945] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 16250 time (ms) | load-checkpoint: 37763.36 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-05 08:00:38 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.072176 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.248 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.159 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.072 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-05 08:00:46 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 50006.74 | train/valid/test-data-iterators-setup: 6864.82 [003-015] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B [001-001] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B[002-017] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B [001-024] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B[001-007] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [003-030] 103.3651B / 103.3651B [002-001] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B[003-000] 125.2243B / 103.3681B [003-012] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B[003-010] 103.3651B / 103.3651B [003-029] 103.3651B / 103.3651B[002-028] 103.3651B / 103.3651B[003-028] 103.3651B / 103.3651B [001-029] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B[003-020] 103.3651B / 103.3651B[002-020] 103.3651B / 103.3651B [002-009] 103.3651B / 103.3651B[003-009] 103.3651B / 103.3651B [002-008] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B [002-012] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B [002-002] 103.3651B / 103.3651B[003-002] 103.3651B / 103.3651B[001-003] 103.3651B / 103.3651B[003-003] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B [001-009] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B[002-024] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B[001-018] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B [002-007] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B[002-031] 125.2273B / 103.3710B[003-031] 125.2273B / 103.3710B [003-001] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B[001-013] 103.3651B / 103.3651B [002-013] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B[003-017] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B[002-015] 103.3651B / 103.3651B [001-015] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B[002-003] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [003-024] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B[002-019] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [003-023] 103.3651B / 103.3651B[001-022] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [001-004] 103.3651B / 103.3651B [002-005] 103.3651B / 103.3651B[003-005] 103.3651B / 103.3651B[001-005] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B[001-010] 103.3651B / 103.3651B [003-011] 103.3651B / 103.3651B [002-029] 103.3651B / 103.3651B [003-027] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B [003-006] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B [000-014] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-017] 103.3651B / 103.3651B[000-016] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-026] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B[000-007] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B[000-002] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B[000-008] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [before the start of training step] datetime: 2022-02-05 08:00:46 [2022-02-05 08:00:46,575] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-05 08:00:46,575] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-05 08:00:46,575] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-05 08:00:46,575] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-05 08:00:46,575] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False srun: Job step aborted: Waiting up to 62 seconds for job step to finish. WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388694 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388695 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408154 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408155 closing signal SIGTERM slurmstepd: error: *** STEP 1747321.0 ON jean-zay-iam01 CANCELLED AT 2022-02-05T08:01:24 *** WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388696 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408156 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385609 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388675 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388697 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408157 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385944 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385610 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388676 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408158 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385945 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388698 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666596 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383713 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388699 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385611 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438715 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666597 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408159 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388677 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385612 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438716 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383700 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383714 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385733 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666598 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384520 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383701 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384521 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383702 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388700 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385946 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408160 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388678 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385613 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385457 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438717 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385734 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383715 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384685 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666599 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387174 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387175 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384522 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383703 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 408161 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385458 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385614 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385459 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438718 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385947 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383716 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384686 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388679 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666600 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388701 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387176 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388680 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387177 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383704 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384523 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385460 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438719 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385735 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385948 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383717 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384105 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384687 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666601 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383705 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384524 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384525 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389785 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385949 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384688 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384106 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385950 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383706 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384526 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383707 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385615 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385461 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438720 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385736 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387178 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388681 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389786 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666602 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389787 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385951 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384107 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389788 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384689 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383718 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384108 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383719 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438721 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385618 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 438722 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 383720 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1666603 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385462 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384690 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385737 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 388682 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384527 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385463 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385738 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387179 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384691 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387180 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389789 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385464 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384109 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 387181 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384692 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385739 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 385740 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389790 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384110 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389791 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384111 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 384112 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 389792 closing signal SIGTERM /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1747459.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] > setting tensorboard ...  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.10.2 torch cuda version ............... 11.3 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+ba9c4cc7, ba9c4cc7, master deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-05 08:03:27,404] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.133 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 7.785 seconds time to initialize megatron (seconds): 71.324 [after megatron is initialized] datetime: 2022-02-05 08:03:35 building GPT model ... [2022-02-05 08:03:35,363] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-05 08:03:35,363] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-05 08:03:35,364] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 47.99 GB, percent = 9.5% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-05 08:03:37,074] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-05 08:03:37,686] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-05 08:03:37,687] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-05 08:03:37,687] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.59 GB, percent = 9.7% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-05 08:03:37,803] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+ba9c4cc7, git-hash=ba9c4cc7, git-branch=master Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] [2022-02-05 08:03:38,694] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-05 08:03:38,695] [INFO] [engine.py:1099:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-05 08:03:38,695] [INFO] [engine.py:1105:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-05 08:03:38,695] [INFO] [engine.py:1121:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-05 08:03:38,695] [INFO] [utils.py:48:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-05 08:03:38,695] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-05 08:03:38,695] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-05 08:03:38,695] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-05 08:03:38,695] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-05 08:03:38,695] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] [2022-02-05 08:03:44,871] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-05 08:03:44,871] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-05 08:03:44,871] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.46 GB, percent = 9.6% [2022-02-05 08:03:44,944] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-05 08:03:44,945] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-05 08:03:44,945] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.46 GB, percent = 9.6% [2022-02-05 08:03:44,945] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-05 08:03:44,967] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-05 08:03:44,967] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-05 08:03:44,967] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.46 GB, percent = 9.6% [2022-02-05 08:03:44,967] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-05 08:03:44,967] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-05 08:03:44,967] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-05 08:03:44,967] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-05 08:03:44,968] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-05 08:03:44,968] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_16bit_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-05 08:03:44,969] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-05 08:03:44,969] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-05 08:03:44,969] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-05 08:03:47,324] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-05 08:04:11,641] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-05 08:04:13,037] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-05 08:04:13,187] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-05 08:04:13,219] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-05 08:04:13,251] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-05 08:04:13,806] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-05 08:04:14,040] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-05 08:04:14,295] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-05 08:04:14,570] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-05 08:04:14,661] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-05 08:04:14,682] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-05 08:04:14,705] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-05 08:04:14,782] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-05 08:04:14,792] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-05 08:04:14,851] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-05 08:04:15,254] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-05 08:04:15,441] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-05 08:04:15,683] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-05 08:04:15,771] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-05 08:04:16,143] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-05 08:04:16,162] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-05 08:04:16,197] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-05 08:04:16,362] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-05 08:04:16,401] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-05 08:04:16,569] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-05 08:04:16,580] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-05 08:04:16,802] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-05 08:04:16,994] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-05 08:04:17,047] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-05 08:04:17,103] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-05 08:04:17,140] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-05 08:04:17,312] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-05 08:04:17,316] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-05 08:04:17,336] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-05 08:04:17,408] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-05 08:04:17,437] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-05 08:04:17,471] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-05 08:04:17,476] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-05 08:04:17,484] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-05 08:04:17,490] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-05 08:04:17,531] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-05 08:04:17,539] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-05 08:04:17,561] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-05 08:04:17,688] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-05 08:04:17,721] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-05 08:04:17,790] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-05 08:04:17,829] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-05 08:04:17,851] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-05 08:04:17,877] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-05 08:04:18,128] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-05 08:04:18,196] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-05 08:04:18,279] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-05 08:04:18,351] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-05 08:04:18,373] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-05 08:04:18,447] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-05 08:04:18,468] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-05 08:04:18,523] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-05 08:04:18,567] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-05 08:04:18,702] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-05 08:04:18,779] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-05 08:04:18,792] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 [2022-02-05 08:04:18,804] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 checkpoint version 3.0 [2022-02-05 08:04:18,890] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-05 08:04:18,902] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-05 08:04:18,911] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-05 08:04:18,953] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-05 08:04:18,956] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-05 08:04:19,041] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-05 08:04:19,053] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-05 08:04:19,076] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-05 08:04:19,117] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-05 08:04:19,139] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-05 08:04:19,163] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-05 08:04:19,171] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-05 08:04:19,215] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-05 08:04:19,239] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-05 08:04:19,280] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-02-05 08:04:19,311] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-05 08:04:19,323] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-05 08:04:19,333] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-05 08:04:19,351] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-05 08:04:19,432] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-05 08:04:19,497] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-05 08:04:19,500] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-05 08:04:19,554] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-05 08:04:19,679] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-05 08:04:19,718] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-05 08:04:19,769] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-05 08:04:19,781] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-05 08:04:19,855] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-05 08:04:19,909] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-05 08:04:19,952] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-05 08:04:19,999] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-05 08:04:20,074] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-05 08:04:20,092] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-05 08:04:20,117] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-05 08:04:20,117] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-05 08:04:20,197] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-05 08:04:20,287] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-05 08:04:20,325] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-05 08:04:20,359] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-05 08:04:20,386] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-05 08:04:20,496] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-05 08:04:20,519] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-05 08:04:20,523] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-05 08:04:20,584] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-05 08:04:20,593] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-05 08:04:20,601] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-05 08:04:20,634] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-05 08:04:20,663] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-05 08:04:20,699] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-05 08:04:20,731] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-05 08:04:20,754] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-05 08:04:20,778] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-05 08:04:20,836] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-05 08:04:20,951] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-05 08:04:21,119] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-05 08:04:21,128] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-05 08:04:21,180] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-05 08:04:21,183] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-05 08:04:21,201] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-05 08:04:21,226] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-05 08:04:21,245] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-05 08:04:21,286] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-05 08:04:21,306] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-05 08:04:21,426] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-05 08:04:21,458] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-05 08:04:21,489] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-05 08:04:21,498] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-05 08:04:21,503] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-05 08:04:21,510] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-05 08:04:21,530] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-05 08:04:21,581] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-05 08:04:21,648] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-05 08:04:21,666] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-05 08:04:21,686] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-05 08:04:21,694] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-05 08:04:21,760] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-05 08:04:21,780] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-05 08:04:21,803] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-05 08:04:21,818] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-05 08:04:21,826] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-05 08:04:21,871] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-05 08:04:21,872] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-05 08:04:21,889] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-05 08:04:21,962] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-05 08:04:22,055] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-05 08:04:22,072] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-05 08:04:22,092] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-05 08:04:22,137] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-05 08:04:22,222] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-05 08:04:22,233] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-05 08:04:22,244] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-05 08:04:22,246] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-05 08:04:22,323] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-05 08:04:22,382] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-05 08:04:22,499] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-05 08:04:22,504] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-05 08:04:22,510] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-05 08:04:22,528] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-05 08:04:22,534] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-05 08:04:22,590] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-05 08:04:22,601] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-05 08:04:22,621] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-05 08:04:22,753] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-05 08:04:22,825] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-05 08:04:22,836] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-05 08:04:22,883] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-05 08:04:22,987] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-05 08:04:22,998] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-05 08:04:22,998] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-05 08:04:23,096] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-05 08:04:23,099] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-05 08:04:23,180] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-05 08:04:23,188] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-05 08:04:23,208] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-05 08:04:23,280] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-05 08:04:23,333] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-05 08:04:23,336] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-05 08:04:23,389] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-05 08:04:23,415] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-05 08:04:23,438] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-05 08:04:23,551] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-05 08:04:23,596] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-05 08:04:23,600] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-05 08:04:23,609] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-05 08:04:23,626] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-05 08:04:23,720] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-05 08:04:23,740] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-05 08:04:23,745] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-05 08:04:23,776] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-05 08:04:23,782] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-05 08:04:23,836] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-05 08:04:23,877] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-05 08:04:23,949] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-05 08:04:23,953] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-05 08:04:23,975] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-05 08:04:23,990] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-05 08:04:23,996] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-05 08:04:24,000] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-05 08:04:24,013] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-05 08:04:24,021] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-05 08:04:24,024] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-05 08:04:24,066] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-05 08:04:24,111] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-05 08:04:24,127] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-05 08:04:24,131] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-05 08:04:24,233] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-05 08:04:24,240] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-05 08:04:24,241] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-05 08:04:24,332] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-05 08:04:24,336] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-05 08:04:24,357] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-05 08:04:24,362] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-05 08:04:24,375] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-05 08:04:24,415] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-05 08:04:24,433] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-05 08:04:24,444] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-05 08:04:24,467] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-05 08:04:24,574] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-05 08:04:24,575] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-05 08:04:24,749] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-05 08:04:24,764] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-05 08:04:24,812] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-05 08:04:24,956] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-05 08:04:25,043] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-05 08:04:25,106] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-05 08:04:25,116] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-05 08:04:25,219] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-05 08:04:25,258] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-05 08:04:25,288] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-05 08:04:25,390] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-05 08:04:25,473] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-05 08:04:25,508] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-05 08:04:25,535] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-05 08:04:25,573] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-05 08:04:25,629] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-05 08:04:25,674] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-02-05 08:04:25,721] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-05 08:04:25,859] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-05 08:04:25,859] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-05 08:04:25,909] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-05 08:04:25,933] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-05 08:04:25,973] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-05 08:04:25,992] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-05 08:04:26,019] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-05 08:04:26,112] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-05 08:04:26,146] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 [2022-02-05 08:04:26,202] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-02-05 08:04:26,283] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-02-05 08:04:26,290] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-05 08:04:26,341] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-05 08:04:26,348] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-05 08:04:26,455] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-05 08:04:26,703] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-05 08:04:26,727] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 16250 time (ms) | load-checkpoint: 38128.34 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-05 08:04:26 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.063015 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.104 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.088 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.065 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-05 08:04:34 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 51420.57 | train/valid/test-data-iterators-setup: 6629.65 [001-000] 125.2243B / 103.3681B [001-001] 103.3651B / 103.3651B [003-001] 103.3651B / 103.3651B [002-001] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B [003-030] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B [001-018] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B [003-000] 125.2243B / 103.3681B [002-024] 103.3651B / 103.3651B [001-024] 103.3651B / 103.3651B[003-024] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B [002-002] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B[003-005] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B[003-006] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B[002-007] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B[001-022] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B[001-026] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B[001-031] 125.2273B / 103.3710B [001-013] 103.3651B / 103.3651B [002-013] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B [003-017] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B [002-015] 103.3651B / 103.3651B[003-015] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [001-009] 103.3651B / 103.3651B[002-008] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B [001-029] 103.3651B / 103.3651B[003-028] 103.3651B / 103.3651B [003-029] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B[001-020] 103.3651B / 103.3651B [002-031] 125.2273B / 103.3710B[002-030] 103.3651B / 103.3651B [001-003] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B[003-011] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B[002-005] 103.3651B / 103.3651B [001-004] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B[002-012] 103.3651B / 103.3651B [003-012] 103.3651B / 103.3651B [003-003] 103.3651B / 103.3651B [003-002] 103.3651B / 103.3651B [003-010] 103.3651B / 103.3651B [001-010] 103.3651B / 103.3651B [001-015] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [002-009] 103.3651B / 103.3651B [002-029] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B[003-023] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [003-027] 103.3651B / 103.3651B[002-027] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [002-010] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [000-007] 103.3651B / 103.3651B [000-017] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-008] 103.3651B / 103.3651B[000-009] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B[000-000] 125.2243B / 103.3681B [000-020] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-013] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-02-05 08:04:34 [2022-02-05 08:04:34,034] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-05 08:04:34,034] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-05 08:04:34,034] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-05 08:04:34,034] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-05 08:04:34,034] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 123] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 iteration 16251/ 292968 | consumed samples: 33282048 | consumed tokens: 16901799936 | elapsed time per iteration (ms): 245413.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.688533E+00 | loss scale: 32768.0 | grad norm: 34552.619 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.008 | TFLOPs: 51.33 | [Rank 127] (after 16251 iterations) memory (MB) | allocated: 13251.47705078125 | max allocated: 20715.84521484375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 124] (after 16251 iterations) memory (MB) | allocated: 13251.47705078125 | max allocated: 20715.82568359375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 126] (after 16251 iterations) memory (MB) | allocated: 13251.47705078125 | max allocated: 20715.13818359375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 122] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 8] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 20] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 0] (after 16251 iterations) memory (MB) | allocated: 13206.53662109375 | max allocated: 20670.15283203125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 24] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 4] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 7] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 3] (after 16251 iterations) memory (MB) | allocated: 13206.31005859375 | max allocated: 20669.92626953125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 6] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 10] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 16251 iterations) memory (MB) | allocated: 13206.31005859375 | max allocated: 20669.92626953125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 23] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 11] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0[Rank 56] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 16251 iterations) memory (MB) | allocated: 10797.1396484375 | max allocated: 16957.3212890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.7177734375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 16251 iterations) memory (MB) | allocated: 10797.1396484375 | max allocated: 16957.3212890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 125] (after 16251 iterations) memory (MB) | allocated: 13251.47705078125 | max allocated: 20715.13818359375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 35] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 58] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 66] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0[Rank 86] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 94] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 118] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 1] (after 16251 iterations) memory (MB) | allocated: 13206.55908203125 | max allocated: 20670.17529296875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 13] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 25] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 5] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 9] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 49] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 16251 iterations) memory (MB) | allocated: 10796.93798828125 | max allocated: 16957.11962890625 | reserved: 20072.0 | max reserved: 20072.0 iteration 16252/ 292968 | consumed samples: 33284096 | consumed tokens: 16903749632 | elapsed time per iteration (ms): 154371.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.693670E+00 | loss scale: 32768.0 | grad norm: 29217.715 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.60 | iteration 16253/ 292968 | consumed samples: 33286144 | consumed tokens: 16905699328 | elapsed time per iteration (ms): 150680.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.676610E+00 | loss scale: 32768.0 | grad norm: 24279.818 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 83.60 | iteration 16254/ 292968 | consumed samples: 33288192 | consumed tokens: 16907649024 | elapsed time per iteration (ms): 148108.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.691706E+00 | loss scale: 32768.0 | grad norm: 17456.377 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.05 | iteration 16255/ 292968 | consumed samples: 33290240 | consumed tokens: 16909598720 | elapsed time per iteration (ms): 148058.6 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.683968E+00 | loss scale: 32768.0 | grad norm: 24759.131 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.08 | iteration 16256/ 292968 | consumed samples: 33292288 | consumed tokens: 16911548416 | elapsed time per iteration (ms): 147501.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.707136E+00 | loss scale: 32768.0 | grad norm: 31535.685 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.40 | iteration 16257/ 292968 | consumed samples: 33294336 | consumed tokens: 16913498112 | elapsed time per iteration (ms): 148635.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.681084E+00 | loss scale: 32768.0 | grad norm: 36272.262 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.75 | iteration 16258/ 292968 | consumed samples: 33296384 | consumed tokens: 16915447808 | elapsed time per iteration (ms): 147513.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.683350E+00 | loss scale: 32768.0 | grad norm: 23139.820 | num zeros: 0.0 | curriculum seqlen: 952 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.39 | iteration 16259/ 292968 | consumed samples: 33298432 | consumed tokens: 16917413888 | elapsed time per iteration (ms): 148695.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.697510E+00 | loss scale: 32768.0 | grad norm: 23984.915 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.42 | iteration 16260/ 292968 | consumed samples: 33300480 | consumed tokens: 16919379968 | elapsed time per iteration (ms): 148373.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.678677E+00 | loss scale: 32768.0 | grad norm: 27709.680 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.61 | iteration 16261/ 292968 | consumed samples: 33302528 | consumed tokens: 16921346048 | elapsed time per iteration (ms): 147467.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.676830E+00 | loss scale: 32768.0 | grad norm: 24708.072 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.14 | iteration 16262/ 292968 | consumed samples: 33304576 | consumed tokens: 16923312128 | elapsed time per iteration (ms): 148212.9 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.691204E+00 | loss scale: 32768.0 | grad norm: 32857.154 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.70 | iteration 16263/ 292968 | consumed samples: 33306624 | consumed tokens: 16925278208 | elapsed time per iteration (ms): 149427.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.801060E+00 | loss scale: 32768.0 | grad norm: 59991.018 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.01 | iteration 16264/ 292968 | consumed samples: 33308672 | consumed tokens: 16927244288 | elapsed time per iteration (ms): 148716.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.747974E+00 | loss scale: 32768.0 | grad norm: 29772.465 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.41 | iteration 16265/ 292968 | consumed samples: 33310720 | consumed tokens: 16929210368 | elapsed time per iteration (ms): 147279.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.787042E+00 | loss scale: 32768.0 | grad norm: 56741.879 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.25 | iteration 16266/ 292968 | consumed samples: 33312768 | consumed tokens: 16931176448 | elapsed time per iteration (ms): 147203.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.735637E+00 | loss scale: 32768.0 | grad norm: 24814.166 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.29 | iteration 16267/ 292968 | consumed samples: 33314816 | consumed tokens: 16933142528 | elapsed time per iteration (ms): 146833.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.764445E+00 | loss scale: 32768.0 | grad norm: 65046.784 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.51 | iteration 16268/ 292968 | consumed samples: 33316864 | consumed tokens: 16935108608 | elapsed time per iteration (ms): 147067.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.716346E+00 | loss scale: 32768.0 | grad norm: 27828.801 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.37 | iteration 16269/ 292968 | consumed samples: 33318912 | consumed tokens: 16937074688 | elapsed time per iteration (ms): 146564.6 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.712730E+00 | loss scale: 32768.0 | grad norm: 53999.001 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.67 | iteration 16270/ 292968 | consumed samples: 33320960 | consumed tokens: 16939040768 | elapsed time per iteration (ms): 177808.0 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.713374E+00 | loss scale: 32768.0 | grad norm: 38499.758 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 71.44 | iteration 16271/ 292968 | consumed samples: 33323008 | consumed tokens: 16941006848 | elapsed time per iteration (ms): 146506.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.726478E+00 | loss scale: 32768.0 | grad norm: 51892.787 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.70 | iteration 16272/ 292968 | consumed samples: 33325056 | consumed tokens: 16942972928 | elapsed time per iteration (ms): 148659.2 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.713340E+00 | loss scale: 32768.0 | grad norm: 29860.511 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.45 | iteration 16273/ 292968 | consumed samples: 33327104 | consumed tokens: 16944939008 | elapsed time per iteration (ms): 146355.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.752392E+00 | loss scale: 32768.0 | grad norm: 34396.509 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.79 | iteration 16274/ 292968 | consumed samples: 33329152 | consumed tokens: 16946905088 | elapsed time per iteration (ms): 146468.1 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.704068E+00 | loss scale: 32768.0 | grad norm: 25051.216 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.72 | iteration 16275/ 292968 | consumed samples: 33331200 | consumed tokens: 16948871168 | elapsed time per iteration (ms): 146279.3 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.682869E+00 | loss scale: 32768.0 | grad norm: 23282.894 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.84 | iteration 16276/ 292968 | consumed samples: 33333248 | consumed tokens: 16950837248 | elapsed time per iteration (ms): 147149.0 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.725096E+00 | loss scale: 32768.0 | grad norm: 23773.336 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.32 | iteration 16277/ 292968 | consumed samples: 33335296 | consumed tokens: 16952803328 | elapsed time per iteration (ms): 146574.6 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.716102E+00 | loss scale: 32768.0 | grad norm: 30305.236 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.66 | iteration 16278/ 292968 | consumed samples: 33337344 | consumed tokens: 16954769408 | elapsed time per iteration (ms): 147256.8 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.689500E+00 | loss scale: 32768.0 | grad norm: 32081.841 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.26 | iteration 16279/ 292968 | consumed samples: 33339392 | consumed tokens: 16956735488 | elapsed time per iteration (ms): 146349.4 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.692023E+00 | loss scale: 32768.0 | grad norm: 30448.899 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.79 | iteration 16280/ 292968 | consumed samples: 33341440 | consumed tokens: 16958701568 | elapsed time per iteration (ms): 146508.5 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.687955E+00 | loss scale: 32768.0 | grad norm: 26826.616 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.70 | iteration 16281/ 292968 | consumed samples: 33343488 | consumed tokens: 16960667648 | elapsed time per iteration (ms): 174130.0 | learning rate: 5.944E-05 | global batch size: 2048 | lm loss: 2.703573E+00 | loss scale: 32768.0 | grad norm: 21785.703 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 72.95 | iteration 16282/ 292968 | consumed samples: 33345536 | consumed tokens: 16962633728 | elapsed time per iteration (ms): 146562.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.701005E+00 | loss scale: 32768.0 | grad norm: 24662.036 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.67 | iteration 16283/ 292968 | consumed samples: 33347584 | consumed tokens: 16964599808 | elapsed time per iteration (ms): 146460.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.717501E+00 | loss scale: 32768.0 | grad norm: 28573.112 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.73 | iteration 16284/ 292968 | consumed samples: 33349632 | consumed tokens: 16966565888 | elapsed time per iteration (ms): 146710.1 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.690088E+00 | loss scale: 32768.0 | grad norm: 24590.599 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.58 | iteration 16285/ 292968 | consumed samples: 33351680 | consumed tokens: 16968531968 | elapsed time per iteration (ms): 146218.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.676432E+00 | loss scale: 32768.0 | grad norm: 28385.551 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16286/ 292968 | consumed samples: 33353728 | consumed tokens: 16970498048 | elapsed time per iteration (ms): 146841.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.695472E+00 | loss scale: 32768.0 | grad norm: 29809.382 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.50 | iteration 16287/ 292968 | consumed samples: 33355776 | consumed tokens: 16972464128 | elapsed time per iteration (ms): 146586.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.702935E+00 | loss scale: 32768.0 | grad norm: 28308.510 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.65 | iteration 16288/ 292968 | consumed samples: 33357824 | consumed tokens: 16974430208 | elapsed time per iteration (ms): 147270.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.697923E+00 | loss scale: 32768.0 | grad norm: 25514.668 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.25 | iteration 16289/ 292968 | consumed samples: 33359872 | consumed tokens: 16976396288 | elapsed time per iteration (ms): 146921.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.698291E+00 | loss scale: 32768.0 | grad norm: 30894.267 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.46 | iteration 16290/ 292968 | consumed samples: 33361920 | consumed tokens: 16978362368 | elapsed time per iteration (ms): 147169.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.688551E+00 | loss scale: 32768.0 | grad norm: 37020.150 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.31 | iteration 16291/ 292968 | consumed samples: 33363968 | consumed tokens: 16980328448 | elapsed time per iteration (ms): 147462.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.692477E+00 | loss scale: 32768.0 | grad norm: 29931.478 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.14 | iteration 16292/ 292968 | consumed samples: 33366016 | consumed tokens: 16982294528 | elapsed time per iteration (ms): 147593.7 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.688833E+00 | loss scale: 32768.0 | grad norm: 45218.090 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.06 | iteration 16293/ 292968 | consumed samples: 33368064 | consumed tokens: 16984260608 | elapsed time per iteration (ms): 154743.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.707209E+00 | loss scale: 32768.0 | grad norm: 28925.841 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.09 | iteration 16294/ 292968 | consumed samples: 33370112 | consumed tokens: 16986226688 | elapsed time per iteration (ms): 146666.3 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.669970E+00 | loss scale: 32768.0 | grad norm: 21494.095 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.61 | iteration 16295/ 292968 | consumed samples: 33372160 | consumed tokens: 16988192768 | elapsed time per iteration (ms): 146602.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.716291E+00 | loss scale: 32768.0 | grad norm: 22257.743 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.64 | iteration 16296/ 292968 | consumed samples: 33374208 | consumed tokens: 16990158848 | elapsed time per iteration (ms): 147059.1 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.706963E+00 | loss scale: 32768.0 | grad norm: 31249.587 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.38 | iteration 16297/ 292968 | consumed samples: 33376256 | consumed tokens: 16992124928 | elapsed time per iteration (ms): 147214.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.684573E+00 | loss scale: 32768.0 | grad norm: 33282.315 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.28 | iteration 16298/ 292968 | consumed samples: 33378304 | consumed tokens: 16994091008 | elapsed time per iteration (ms): 147823.7 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.686365E+00 | loss scale: 32768.0 | grad norm: 17203.171 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.93 | iteration 16299/ 292968 | consumed samples: 33380352 | consumed tokens: 16996057088 | elapsed time per iteration (ms): 146738.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.713868E+00 | loss scale: 32768.0 | grad norm: 27829.808 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.56 | iteration 16300/ 292968 | consumed samples: 33382400 | consumed tokens: 16998023168 | elapsed time per iteration (ms): 147287.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.694697E+00 | loss scale: 32768.0 | grad norm: 30786.618 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.24 | saving checkpoint at iteration 16300 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 10:10:26,320] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/mp_rank_01_model_states.pt [2022-02-05 10:10:26,476] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/mp_rank_00_model_states.pt [2022-02-05 10:14:01,582] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 10:14:01,601] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 10:14:01,854] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 10:14:02,119] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 10:14:02,931] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 10:14:03,015] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 10:14:03,389] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 10:14:03,416] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 10:14:03,498] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 10:14:03,677] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 10:14:03,730] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 10:14:03,829] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 10:14:03,921] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 10:14:04,139] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 10:14:04,328] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 10:14:04,385] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 10:14:04,399] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 10:14:04,509] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 10:14:04,527] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 10:14:04,548] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 10:14:04,827] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 10:14:04,882] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 10:14:05,032] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 10:14:05,290] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 10:14:05,726] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 10:14:05,747] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 10:14:05,632] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 10:14:05,722] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 10:14:06,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 10:14:06,064] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 10:14:06,103] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 10:14:06,119] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 10:14:06,605] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 10:14:06,650] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 10:14:06,869] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 10:14:06,875] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 10:14:07,040] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 10:14:07,069] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 10:14:07,073] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 10:14:07,030] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 10:14:07,196] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 10:14:07,269] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 10:14:07,252] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 10:14:07,287] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 10:14:07,048] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 10:14:07,582] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 10:14:07,713] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 10:14:07,751] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 10:14:07,814] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 10:14:08,129] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 10:14:08,153] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 10:14:08,221] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 10:14:08,425] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 10:14:08,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 10:14:08,744] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 10:14:08,819] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 10:14:08,846] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 10:14:08,949] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 10:14:09,073] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 10:14:09,079] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 10:14:09,218] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 10:14:09,226] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 10:14:09,229] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 10:14:09,238] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 10:14:09,763] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 10:14:09,897] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 10:14:09,910] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 10:14:09,936] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 10:14:09,806] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 10:14:10,003] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 10:14:11,581] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 10:14:11,648] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 10:14:11,644] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 10:14:11,739] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 10:14:11,921] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 10:14:11,926] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 10:14:12,137] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 10:14:12,164] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 10:14:12,223] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 10:14:12,229] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 10:14:12,242] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 10:14:12,406] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 10:14:12,859] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 10:14:12,959] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 10:14:13,138] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 10:14:13,237] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 10:14:13,346] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 10:14:13,484] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 10:14:13,608] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 10:14:13,657] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 10:14:14,095] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 10:14:14,135] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 10:14:14,183] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 10:14:14,469] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 10:14:14,527] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 10:14:14,571] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 10:14:14,765] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 10:14:14,692] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 10:14:14,740] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 10:14:15,244] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 10:14:15,289] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 10:14:15,329] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 10:14:15,473] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 10:14:15,517] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 10:14:16,261] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 10:14:16,295] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 10:14:16,306] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 10:14:16,565] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 10:14:16,673] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 10:14:16,781] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 10:14:17,704] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 10:14:17,778] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 10:14:18,144] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 10:14:18,392] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 10:14:18,676] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 10:14:18,679] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 10:14:18,714] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 10:14:18,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 10:14:18,815] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 10:14:21,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 10:14:25,926] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 10:14:25,954] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 10:14:27,809] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 10:14:27,936] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 10:14:31,041] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 10:14:31,757] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 10:14:33,968] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 10:14:34,027] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16300/zero_pp_rank_0_mp_rank_02_optim_states.pt successfully saved checkpoint at iteration 16300 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 263896.55 iteration 16301/ 292968 | consumed samples: 33384448 | consumed tokens: 16999989248 | elapsed time per iteration (ms): 417811.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.681556E+00 | loss scale: 32768.0 | grad norm: 21973.176 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.005 | TFLOPs: 30.40 | iteration 16302/ 292968 | consumed samples: 33386496 | consumed tokens: 17001955328 | elapsed time per iteration (ms): 148683.4 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.683394E+00 | loss scale: 32768.0 | grad norm: 16768.833 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.43 | iteration 16303/ 292968 | consumed samples: 33388544 | consumed tokens: 17003921408 | elapsed time per iteration (ms): 147885.1 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.690031E+00 | loss scale: 32768.0 | grad norm: 20665.472 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.89 | iteration 16304/ 292968 | consumed samples: 33390592 | consumed tokens: 17005887488 | elapsed time per iteration (ms): 147226.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.688045E+00 | loss scale: 32768.0 | grad norm: 25012.401 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.28 | iteration 16305/ 292968 | consumed samples: 33392640 | consumed tokens: 17007853568 | elapsed time per iteration (ms): 147259.1 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.710673E+00 | loss scale: 32768.0 | grad norm: 29836.360 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.26 | iteration 16306/ 292968 | consumed samples: 33394688 | consumed tokens: 17009819648 | elapsed time per iteration (ms): 147677.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.699476E+00 | loss scale: 32768.0 | grad norm: 30546.658 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.01 | iteration 16307/ 292968 | consumed samples: 33396736 | consumed tokens: 17011785728 | elapsed time per iteration (ms): 146671.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.684245E+00 | loss scale: 32768.0 | grad norm: 34892.057 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.60 | iteration 16308/ 292968 | consumed samples: 33398784 | consumed tokens: 17013751808 | elapsed time per iteration (ms): 146526.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.710635E+00 | loss scale: 32768.0 | grad norm: 27416.150 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.69 | iteration 16309/ 292968 | consumed samples: 33400832 | consumed tokens: 17015717888 | elapsed time per iteration (ms): 147248.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.661435E+00 | loss scale: 32768.0 | grad norm: 25619.315 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.26 | iteration 16310/ 292968 | consumed samples: 33402880 | consumed tokens: 17017683968 | elapsed time per iteration (ms): 146959.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.655688E+00 | loss scale: 32768.0 | grad norm: 23399.659 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.43 | iteration 16311/ 292968 | consumed samples: 33404928 | consumed tokens: 17019650048 | elapsed time per iteration (ms): 147533.3 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.708743E+00 | loss scale: 32768.0 | grad norm: 16414.348 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.10 | iteration 16312/ 292968 | consumed samples: 33406976 | consumed tokens: 17021616128 | elapsed time per iteration (ms): 146876.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.699538E+00 | loss scale: 32768.0 | grad norm: 20079.449 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.48 | iteration 16313/ 292968 | consumed samples: 33409024 | consumed tokens: 17023582208 | elapsed time per iteration (ms): 146396.7 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.646977E+00 | loss scale: 32768.0 | grad norm: 32612.611 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.77 | iteration 16314/ 292968 | consumed samples: 33411072 | consumed tokens: 17025548288 | elapsed time per iteration (ms): 145947.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.691743E+00 | loss scale: 32768.0 | grad norm: 41987.950 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.03 | iteration 16315/ 292968 | consumed samples: 33413120 | consumed tokens: 17027514368 | elapsed time per iteration (ms): 146083.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.698901E+00 | loss scale: 32768.0 | grad norm: 24319.253 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.95 | iteration 16316/ 292968 | consumed samples: 33415168 | consumed tokens: 17029480448 | elapsed time per iteration (ms): 145971.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.679266E+00 | loss scale: 32768.0 | grad norm: 34679.424 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.02 | iteration 16317/ 292968 | consumed samples: 33417216 | consumed tokens: 17031446528 | elapsed time per iteration (ms): 146418.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.671961E+00 | loss scale: 32768.0 | grad norm: 41749.448 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.75 | iteration 16318/ 292968 | consumed samples: 33419264 | consumed tokens: 17033412608 | elapsed time per iteration (ms): 146281.4 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.693718E+00 | loss scale: 32768.0 | grad norm: 19518.047 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.83 | iteration 16319/ 292968 | consumed samples: 33421312 | consumed tokens: 17035378688 | elapsed time per iteration (ms): 146171.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.692381E+00 | loss scale: 32768.0 | grad norm: 28930.392 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.90 | iteration 16320/ 292968 | consumed samples: 33423360 | consumed tokens: 17037344768 | elapsed time per iteration (ms): 146198.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.755946E+00 | loss scale: 32768.0 | grad norm: 26361.920 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.88 | iteration 16321/ 292968 | consumed samples: 33425408 | consumed tokens: 17039310848 | elapsed time per iteration (ms): 146886.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.682236E+00 | loss scale: 32768.0 | grad norm: 31779.183 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.48 | iteration 16322/ 292968 | consumed samples: 33427456 | consumed tokens: 17041276928 | elapsed time per iteration (ms): 146544.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.734755E+00 | loss scale: 32768.0 | grad norm: 52561.351 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.68 | iteration 16323/ 292968 | consumed samples: 33429504 | consumed tokens: 17043243008 | elapsed time per iteration (ms): 146516.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.688018E+00 | loss scale: 32768.0 | grad norm: 25123.413 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.69 | iteration 16324/ 292968 | consumed samples: 33431552 | consumed tokens: 17045209088 | elapsed time per iteration (ms): 147416.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.702058E+00 | loss scale: 32768.0 | grad norm: 41412.833 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.17 | iteration 16325/ 292968 | consumed samples: 33433600 | consumed tokens: 17047175168 | elapsed time per iteration (ms): 146622.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.701560E+00 | loss scale: 32768.0 | grad norm: 32410.022 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.63 | iteration 16326/ 292968 | consumed samples: 33435648 | consumed tokens: 17049141248 | elapsed time per iteration (ms): 147862.3 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.703492E+00 | loss scale: 32768.0 | grad norm: 55111.493 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.91 | iteration 16327/ 292968 | consumed samples: 33437696 | consumed tokens: 17051107328 | elapsed time per iteration (ms): 146562.7 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.711676E+00 | loss scale: 32768.0 | grad norm: 29661.740 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.67 | iteration 16328/ 292968 | consumed samples: 33439744 | consumed tokens: 17053073408 | elapsed time per iteration (ms): 146632.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.716322E+00 | loss scale: 32768.0 | grad norm: 67171.069 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.63 | iteration 16329/ 292968 | consumed samples: 33441792 | consumed tokens: 17055039488 | elapsed time per iteration (ms): 148218.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.723939E+00 | loss scale: 32768.0 | grad norm: 42147.548 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.70 | iteration 16330/ 292968 | consumed samples: 33443840 | consumed tokens: 17057005568 | elapsed time per iteration (ms): 146682.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.738926E+00 | loss scale: 32768.0 | grad norm: 44342.725 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.60 | iteration 16331/ 292968 | consumed samples: 33445888 | consumed tokens: 17058971648 | elapsed time per iteration (ms): 146595.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.711863E+00 | loss scale: 32768.0 | grad norm: 39322.169 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.65 | iteration 16332/ 292968 | consumed samples: 33447936 | consumed tokens: 17060937728 | elapsed time per iteration (ms): 146507.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.697363E+00 | loss scale: 32768.0 | grad norm: 32735.097 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.70 | iteration 16333/ 292968 | consumed samples: 33449984 | consumed tokens: 17062903808 | elapsed time per iteration (ms): 146865.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.703079E+00 | loss scale: 32768.0 | grad norm: 36921.684 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.49 | iteration 16334/ 292968 | consumed samples: 33452032 | consumed tokens: 17064869888 | elapsed time per iteration (ms): 146424.1 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.688407E+00 | loss scale: 32768.0 | grad norm: 35350.892 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.75 | iteration 16335/ 292968 | consumed samples: 33454080 | consumed tokens: 17066835968 | elapsed time per iteration (ms): 146249.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.693755E+00 | loss scale: 32768.0 | grad norm: 35964.129 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.85 | iteration 16336/ 292968 | consumed samples: 33456128 | consumed tokens: 17068802048 | elapsed time per iteration (ms): 147756.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.689548E+00 | loss scale: 32768.0 | grad norm: 34236.203 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.97 | iteration 16337/ 292968 | consumed samples: 33458176 | consumed tokens: 17070768128 | elapsed time per iteration (ms): 146620.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.703422E+00 | loss scale: 32768.0 | grad norm: 30917.702 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.63 | iteration 16338/ 292968 | consumed samples: 33460224 | consumed tokens: 17072734208 | elapsed time per iteration (ms): 146491.7 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.675190E+00 | loss scale: 32768.0 | grad norm: 28077.354 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.71 | iteration 16339/ 292968 | consumed samples: 33462272 | consumed tokens: 17074700288 | elapsed time per iteration (ms): 146227.7 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.706941E+00 | loss scale: 32768.0 | grad norm: 28610.262 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16340/ 292968 | consumed samples: 33464320 | consumed tokens: 17076666368 | elapsed time per iteration (ms): 146650.3 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.727561E+00 | loss scale: 32768.0 | grad norm: 33629.720 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.62 | iteration 16341/ 292968 | consumed samples: 33466368 | consumed tokens: 17078632448 | elapsed time per iteration (ms): 146869.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.710932E+00 | loss scale: 32768.0 | grad norm: 40800.563 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.49 | iteration 16342/ 292968 | consumed samples: 33468416 | consumed tokens: 17080598528 | elapsed time per iteration (ms): 146076.4 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.705940E+00 | loss scale: 32768.0 | grad norm: 30553.436 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.96 | iteration 16343/ 292968 | consumed samples: 33470464 | consumed tokens: 17082564608 | elapsed time per iteration (ms): 146224.3 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.707341E+00 | loss scale: 32768.0 | grad norm: 47557.679 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16344/ 292968 | consumed samples: 33472512 | consumed tokens: 17084530688 | elapsed time per iteration (ms): 146586.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.692413E+00 | loss scale: 32768.0 | grad norm: 25260.808 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.65 | iteration 16345/ 292968 | consumed samples: 33474560 | consumed tokens: 17086496768 | elapsed time per iteration (ms): 148484.3 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.702203E+00 | loss scale: 32768.0 | grad norm: 35993.015 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.55 | iteration 16346/ 292968 | consumed samples: 33476608 | consumed tokens: 17088462848 | elapsed time per iteration (ms): 146528.2 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.695250E+00 | loss scale: 32768.0 | grad norm: 24793.903 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.69 | iteration 16347/ 292968 | consumed samples: 33478656 | consumed tokens: 17090428928 | elapsed time per iteration (ms): 146718.0 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.719028E+00 | loss scale: 32768.0 | grad norm: 39132.717 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.58 | iteration 16348/ 292968 | consumed samples: 33480704 | consumed tokens: 17092395008 | elapsed time per iteration (ms): 146873.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.695376E+00 | loss scale: 32768.0 | grad norm: 33408.531 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.48 | iteration 16349/ 292968 | consumed samples: 33482752 | consumed tokens: 17094361088 | elapsed time per iteration (ms): 146574.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.685834E+00 | loss scale: 32768.0 | grad norm: 30370.657 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.66 | iteration 16350/ 292968 | consumed samples: 33484800 | consumed tokens: 17096327168 | elapsed time per iteration (ms): 146642.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.677087E+00 | loss scale: 32768.0 | grad norm: 26562.353 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.62 | ------------------------------------------------------------------------------------------- valid loss at iteration 16350 | lm loss value: 3.026082E+00 | lm loss PPL: 2.061630E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 16350 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 12:25:02,790] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/mp_rank_00_model_states.pt [2022-02-05 12:25:02,802] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/mp_rank_01_model_states.pt [2022-02-05 12:26:37,578] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 12:26:39,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 12:26:41,097] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 12:26:41,810] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 12:26:42,248] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 12:26:42,407] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 12:26:42,437] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 12:26:42,453] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 12:26:42,512] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 12:26:42,626] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 12:26:42,631] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 12:26:42,615] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 12:26:43,092] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 12:26:43,286] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 12:26:44,106] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 12:26:44,242] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 12:26:44,434] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 12:26:44,676] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 12:26:44,722] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 12:26:44,777] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 12:26:44,847] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 12:26:44,915] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 12:26:46,144] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 12:26:46,339] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 12:26:46,709] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 12:26:46,777] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 12:26:47,005] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 12:26:47,058] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 12:26:47,472] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 12:26:47,618] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 12:26:47,624] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 12:26:47,715] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 12:26:47,834] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 12:26:47,883] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 12:26:47,940] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 12:26:48,046] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 12:26:48,146] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 12:26:48,238] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 12:26:48,265] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 12:26:48,314] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 12:26:48,354] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 12:26:48,392] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 12:26:48,413] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 12:26:48,486] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 12:26:48,486] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 12:26:48,613] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 12:26:48,628] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 12:26:48,652] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 12:26:48,683] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 12:26:48,852] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 12:26:48,905] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 12:26:49,549] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 12:26:49,591] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 12:26:49,626] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 12:26:49,639] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 12:26:49,541] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 12:26:49,955] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 12:26:50,055] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 12:26:49,998] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 12:26:50,188] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 12:26:50,370] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 12:26:50,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 12:26:50,946] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 12:26:50,862] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 12:26:51,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 12:26:51,433] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 12:26:51,624] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 12:26:52,030] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 12:26:52,205] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 12:26:52,539] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 12:26:52,609] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 12:26:52,825] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 12:26:52,866] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 12:26:53,000] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 12:26:53,015] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 12:26:53,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 12:26:53,757] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 12:26:53,986] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 12:26:54,157] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 12:26:54,179] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 12:26:54,242] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 12:26:54,367] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 12:26:54,438] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 12:26:54,463] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 12:26:54,628] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 12:26:54,748] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 12:26:54,696] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 12:26:54,865] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 12:26:54,895] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 12:26:55,001] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 12:26:54,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 12:26:54,722] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 12:26:54,725] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 12:26:55,079] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 12:26:54,937] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 12:26:55,221] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 12:26:55,306] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 12:26:55,313] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 12:26:55,319] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 12:26:55,347] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 12:26:55,806] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 12:26:55,952] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 12:26:56,062] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 12:26:56,381] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 12:26:56,434] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 12:26:56,449] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 12:26:56,595] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 12:26:56,635] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 12:26:56,695] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 12:26:56,851] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 12:26:56,916] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 12:26:57,248] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 12:26:57,279] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 12:26:57,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 12:26:57,564] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 12:26:57,595] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 12:26:57,703] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 12:26:57,724] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 12:26:57,750] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 12:26:57,945] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 12:26:57,950] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 12:26:58,439] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 12:26:59,048] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 12:26:59,492] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 12:27:00,337] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 12:27:00,394] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 12:27:02,088] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 12:27:02,220] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16350/zero_pp_rank_0_mp_rank_02_optim_states.pt successfully saved checkpoint at iteration 16350 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 144070.83 iteration 16351/ 292968 | consumed samples: 33486848 | consumed tokens: 17098293248 | elapsed time per iteration (ms): 756708.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.689797E+00 | loss scale: 32768.0 | grad norm: 22430.271 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 16.79 | iteration 16352/ 292968 | consumed samples: 33488896 | consumed tokens: 17100259328 | elapsed time per iteration (ms): 150997.4 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.713041E+00 | loss scale: 32768.0 | grad norm: 24192.787 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.12 | iteration 16353/ 292968 | consumed samples: 33490944 | consumed tokens: 17102225408 | elapsed time per iteration (ms): 149788.9 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.691544E+00 | loss scale: 32768.0 | grad norm: 31571.930 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.80 | iteration 16354/ 292968 | consumed samples: 33492992 | consumed tokens: 17104191488 | elapsed time per iteration (ms): 150012.5 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.692786E+00 | loss scale: 32768.0 | grad norm: 33816.673 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.67 | iteration 16355/ 292968 | consumed samples: 33495040 | consumed tokens: 17106157568 | elapsed time per iteration (ms): 152359.1 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.707805E+00 | loss scale: 32768.0 | grad norm: 39519.101 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.37 | iteration 16356/ 292968 | consumed samples: 33497088 | consumed tokens: 17108123648 | elapsed time per iteration (ms): 148995.8 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.678679E+00 | loss scale: 32768.0 | grad norm: 26164.098 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.25 | iteration 16357/ 292968 | consumed samples: 33499136 | consumed tokens: 17110089728 | elapsed time per iteration (ms): 148237.4 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.711006E+00 | loss scale: 32768.0 | grad norm: 31718.858 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.69 | iteration 16358/ 292968 | consumed samples: 33501184 | consumed tokens: 17112055808 | elapsed time per iteration (ms): 148185.6 | learning rate: 5.943E-05 | global batch size: 2048 | lm loss: 2.700021E+00 | loss scale: 32768.0 | grad norm: 32735.028 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.72 | iteration 16359/ 292968 | consumed samples: 33503232 | consumed tokens: 17114021888 | elapsed time per iteration (ms): 148528.2 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.700378E+00 | loss scale: 32768.0 | grad norm: 29055.179 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.52 | iteration 16360/ 292968 | consumed samples: 33505280 | consumed tokens: 17115987968 | elapsed time per iteration (ms): 151264.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.669737E+00 | loss scale: 32768.0 | grad norm: 33009.463 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 83.97 | iteration 16361/ 292968 | consumed samples: 33507328 | consumed tokens: 17117954048 | elapsed time per iteration (ms): 148257.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.669935E+00 | loss scale: 32768.0 | grad norm: 33757.548 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.68 | iteration 16362/ 292968 | consumed samples: 33509376 | consumed tokens: 17119920128 | elapsed time per iteration (ms): 148749.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.696594E+00 | loss scale: 32768.0 | grad norm: 25633.580 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.39 | iteration 16363/ 292968 | consumed samples: 33511424 | consumed tokens: 17121886208 | elapsed time per iteration (ms): 150300.9 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.677744E+00 | loss scale: 32768.0 | grad norm: 23162.606 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.51 | iteration 16364/ 292968 | consumed samples: 33513472 | consumed tokens: 17123852288 | elapsed time per iteration (ms): 148166.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.697892E+00 | loss scale: 32768.0 | grad norm: 25875.054 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.73 | iteration 16365/ 292968 | consumed samples: 33515520 | consumed tokens: 17125818368 | elapsed time per iteration (ms): 149033.4 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.699963E+00 | loss scale: 32768.0 | grad norm: 25380.845 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.23 | iteration 16366/ 292968 | consumed samples: 33517568 | consumed tokens: 17127784448 | elapsed time per iteration (ms): 147077.2 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.686131E+00 | loss scale: 32768.0 | grad norm: 24429.680 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.36 | iteration 16367/ 292968 | consumed samples: 33519616 | consumed tokens: 17129750528 | elapsed time per iteration (ms): 151469.4 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.718952E+00 | loss scale: 32768.0 | grad norm: 28545.683 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 83.86 | iteration 16368/ 292968 | consumed samples: 33521664 | consumed tokens: 17131716608 | elapsed time per iteration (ms): 147045.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.722629E+00 | loss scale: 32768.0 | grad norm: 31635.115 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.38 | iteration 16369/ 292968 | consumed samples: 33523712 | consumed tokens: 17133682688 | elapsed time per iteration (ms): 148257.4 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.709889E+00 | loss scale: 32768.0 | grad norm: 32168.263 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.68 | iteration 16370/ 292968 | consumed samples: 33525760 | consumed tokens: 17135648768 | elapsed time per iteration (ms): 148104.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.716219E+00 | loss scale: 32768.0 | grad norm: 37077.411 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.77 | iteration 16371/ 292968 | consumed samples: 33527808 | consumed tokens: 17137614848 | elapsed time per iteration (ms): 147627.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.699574E+00 | loss scale: 32768.0 | grad norm: 23881.478 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.04 | iteration 16372/ 292968 | consumed samples: 33529856 | consumed tokens: 17139580928 | elapsed time per iteration (ms): 148088.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.686160E+00 | loss scale: 32768.0 | grad norm: 28520.070 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.77 | iteration 16373/ 292968 | consumed samples: 33531904 | consumed tokens: 17141547008 | elapsed time per iteration (ms): 149978.3 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.691990E+00 | loss scale: 32768.0 | grad norm: 27871.815 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.69 | iteration 16374/ 292968 | consumed samples: 33533952 | consumed tokens: 17143513088 | elapsed time per iteration (ms): 147778.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.717556E+00 | loss scale: 32768.0 | grad norm: 26313.305 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.95 | iteration 16375/ 292968 | consumed samples: 33536000 | consumed tokens: 17145479168 | elapsed time per iteration (ms): 147379.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.717819E+00 | loss scale: 32768.0 | grad norm: 30307.563 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.19 | iteration 16376/ 292968 | consumed samples: 33538048 | consumed tokens: 17147445248 | elapsed time per iteration (ms): 146723.5 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.701401E+00 | loss scale: 32768.0 | grad norm: 35823.872 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.57 | iteration 16377/ 292968 | consumed samples: 33540096 | consumed tokens: 17149411328 | elapsed time per iteration (ms): 146884.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.717968E+00 | loss scale: 32768.0 | grad norm: 25611.850 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.48 | iteration 16378/ 292968 | consumed samples: 33542144 | consumed tokens: 17151377408 | elapsed time per iteration (ms): 152229.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.750753E+00 | loss scale: 32768.0 | grad norm: 26497.202 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.44 | iteration 16379/ 292968 | consumed samples: 33544192 | consumed tokens: 17153343488 | elapsed time per iteration (ms): 147112.7 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.727981E+00 | loss scale: 32768.0 | grad norm: 35818.882 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.34 | iteration 16380/ 292968 | consumed samples: 33546240 | consumed tokens: 17155309568 | elapsed time per iteration (ms): 147431.9 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.720596E+00 | loss scale: 32768.0 | grad norm: 35143.007 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.16 | iteration 16381/ 292968 | consumed samples: 33548288 | consumed tokens: 17157275648 | elapsed time per iteration (ms): 146872.3 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.718752E+00 | loss scale: 32768.0 | grad norm: 45417.472 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.48 | iteration 16382/ 292968 | consumed samples: 33550336 | consumed tokens: 17159241728 | elapsed time per iteration (ms): 146642.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.772379E+00 | loss scale: 32768.0 | grad norm: 43824.320 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.62 | iteration 16383/ 292968 | consumed samples: 33552384 | consumed tokens: 17161207808 | elapsed time per iteration (ms): 148299.3 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.700335E+00 | loss scale: 32768.0 | grad norm: 27095.257 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.65 | iteration 16384/ 292968 | consumed samples: 33554432 | consumed tokens: 17163173888 | elapsed time per iteration (ms): 147378.4 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.726312E+00 | loss scale: 32768.0 | grad norm: 42073.767 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.19 | iteration 16385/ 292968 | consumed samples: 33556480 | consumed tokens: 17165139968 | elapsed time per iteration (ms): 146757.2 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.715786E+00 | loss scale: 32768.0 | grad norm: 19161.914 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.55 | iteration 16386/ 292968 | consumed samples: 33558528 | consumed tokens: 17167106048 | elapsed time per iteration (ms): 146628.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.702870E+00 | loss scale: 32768.0 | grad norm: 25169.939 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.63 | iteration 16387/ 292968 | consumed samples: 33560576 | consumed tokens: 17169072128 | elapsed time per iteration (ms): 146881.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.717370E+00 | loss scale: 32768.0 | grad norm: 27320.133 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.48 | iteration 16388/ 292968 | consumed samples: 33562624 | consumed tokens: 17171038208 | elapsed time per iteration (ms): 146544.7 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.724391E+00 | loss scale: 32768.0 | grad norm: 37171.633 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.68 | iteration 16389/ 292968 | consumed samples: 33564672 | consumed tokens: 17173004288 | elapsed time per iteration (ms): 146189.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.690468E+00 | loss scale: 32768.0 | grad norm: 38520.762 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.89 | iteration 16390/ 292968 | consumed samples: 33566720 | consumed tokens: 17174970368 | elapsed time per iteration (ms): 146498.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.691968E+00 | loss scale: 32768.0 | grad norm: 20242.923 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.71 | iteration 16391/ 292968 | consumed samples: 33568768 | consumed tokens: 17176936448 | elapsed time per iteration (ms): 146438.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.682792E+00 | loss scale: 32768.0 | grad norm: 27712.283 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.74 | iteration 16392/ 292968 | consumed samples: 33570816 | consumed tokens: 17178902528 | elapsed time per iteration (ms): 146088.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.697908E+00 | loss scale: 32768.0 | grad norm: 36378.053 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.95 | iteration 16393/ 292968 | consumed samples: 33572864 | consumed tokens: 17180868608 | elapsed time per iteration (ms): 146284.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.701037E+00 | loss scale: 32768.0 | grad norm: 32283.501 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.83 | iteration 16394/ 292968 | consumed samples: 33574912 | consumed tokens: 17182834688 | elapsed time per iteration (ms): 146370.4 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.696639E+00 | loss scale: 32768.0 | grad norm: 32220.102 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.78 | iteration 16395/ 292968 | consumed samples: 33576960 | consumed tokens: 17184800768 | elapsed time per iteration (ms): 146545.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.708388E+00 | loss scale: 32768.0 | grad norm: 36534.604 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.68 | iteration 16396/ 292968 | consumed samples: 33579008 | consumed tokens: 17186766848 | elapsed time per iteration (ms): 148445.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.685491E+00 | loss scale: 32768.0 | grad norm: 25618.058 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.57 | iteration 16397/ 292968 | consumed samples: 33581056 | consumed tokens: 17188732928 | elapsed time per iteration (ms): 146275.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.682512E+00 | loss scale: 32768.0 | grad norm: 33042.085 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.84 | iteration 16398/ 292968 | consumed samples: 33583104 | consumed tokens: 17190699008 | elapsed time per iteration (ms): 146381.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.693375E+00 | loss scale: 32768.0 | grad norm: 30538.729 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.77 | iteration 16399/ 292968 | consumed samples: 33585152 | consumed tokens: 17192665088 | elapsed time per iteration (ms): 146202.7 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.698977E+00 | loss scale: 32768.0 | grad norm: 34704.914 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.88 | iteration 16400/ 292968 | consumed samples: 33587200 | consumed tokens: 17194631168 | elapsed time per iteration (ms): 148262.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.712656E+00 | loss scale: 32768.0 | grad norm: 33383.371 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.67 | saving checkpoint at iteration 16400 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 14:30:45,150] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/mp_rank_01_model_states.pt [2022-02-05 14:30:45,151] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/mp_rank_00_model_states.pt [2022-02-05 14:31:27,404] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 14:31:27,878] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 14:31:27,942] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 14:31:28,215] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 14:31:28,847] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 14:31:28,903] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 14:31:29,853] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 14:31:29,976] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 14:31:30,508] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 14:31:30,815] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 14:31:30,931] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 14:31:30,979] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 14:31:31,081] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 14:31:31,182] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 14:31:31,211] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 14:31:31,239] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 14:31:31,277] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 14:31:31,355] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 14:31:31,659] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 14:31:31,713] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 14:31:31,789] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 14:31:32,005] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 14:31:32,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 14:31:32,282] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 14:31:32,331] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 14:31:32,385] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 14:31:32,544] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 14:31:32,918] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 14:31:33,014] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 14:31:33,161] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 14:31:33,273] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 14:31:33,298] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 14:31:33,450] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 14:31:33,669] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 14:31:33,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 14:31:33,773] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 14:31:33,752] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 14:31:33,793] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 14:31:33,870] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 14:31:33,968] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 14:31:33,949] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 14:31:33,990] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 14:31:34,104] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 14:31:34,226] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 14:31:34,438] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 14:31:34,718] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 14:31:34,836] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 14:31:34,868] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 14:31:34,863] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 14:31:34,906] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 14:31:34,969] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 14:31:34,945] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 14:31:34,959] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 14:31:35,023] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 14:31:35,199] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 14:31:35,233] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 14:31:35,210] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 14:31:35,179] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 14:31:35,375] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 14:31:35,437] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 14:31:35,458] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 14:31:35,220] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 14:31:35,568] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 14:31:35,571] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 14:31:35,929] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 14:31:36,070] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 14:31:36,097] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 14:31:36,140] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 14:31:36,162] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 14:31:36,223] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 14:31:36,401] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 14:31:36,453] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 14:31:36,474] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 14:31:36,469] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 14:31:36,539] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 14:31:36,560] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 14:31:36,604] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 14:31:36,696] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 14:31:37,024] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 14:31:37,034] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 14:31:37,061] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 14:31:37,105] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 14:31:37,303] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 14:31:37,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 14:31:37,652] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 14:31:37,750] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 14:31:37,805] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 14:31:37,842] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 14:31:37,845] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 14:31:37,926] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 14:31:38,085] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 14:31:38,205] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 14:31:38,241] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 14:31:38,694] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 14:31:38,828] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 14:31:39,498] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 14:31:39,605] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 14:31:39,935] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 14:31:41,976] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 14:31:42,412] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 14:31:42,625] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 14:31:42,675] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 14:31:42,706] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 14:31:42,768] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 14:31:43,363] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 14:31:43,555] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 14:31:44,110] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 14:31:44,161] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 14:31:44,323] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 14:31:44,508] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 14:31:44,618] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 14:31:45,082] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 14:31:45,134] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 14:31:45,166] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 14:31:45,657] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 14:31:45,692] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 14:31:45,726] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 14:31:45,798] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 14:31:46,147] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 14:31:46,288] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 14:31:47,321] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 14:31:47,623] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 14:31:50,674] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 14:31:55,913] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 14:31:59,876] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 14:31:59,892] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 14:32:01,328] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 14:32:01,458] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16400/zero_pp_rank_0_mp_rank_75_optim_states.pt successfully saved checkpoint at iteration 16400 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 90825.55 iteration 16401/ 292968 | consumed samples: 33589248 | consumed tokens: 17196597248 | elapsed time per iteration (ms): 236235.4 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.708060E+00 | loss scale: 32768.0 | grad norm: 23877.601 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.009 | TFLOPs: 53.77 | iteration 16402/ 292968 | consumed samples: 33591296 | consumed tokens: 17198563328 | elapsed time per iteration (ms): 146927.7 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.669815E+00 | loss scale: 32768.0 | grad norm: 24868.995 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.45 | iteration 16403/ 292968 | consumed samples: 33593344 | consumed tokens: 17200529408 | elapsed time per iteration (ms): 145664.9 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.693204E+00 | loss scale: 32768.0 | grad norm: 21990.978 | num zeros: 0.0 | curriculum seqlen: 960 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.20 | iteration 16404/ 292968 | consumed samples: 33595392 | consumed tokens: 17202511872 | elapsed time per iteration (ms): 149029.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.703935E+00 | loss scale: 32768.0 | grad norm: 26333.055 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.94 | iteration 16405/ 292968 | consumed samples: 33597440 | consumed tokens: 17204494336 | elapsed time per iteration (ms): 148449.3 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.702340E+00 | loss scale: 32768.0 | grad norm: 29382.717 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.28 | iteration 16406/ 292968 | consumed samples: 33599488 | consumed tokens: 17206476800 | elapsed time per iteration (ms): 147558.3 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.681581E+00 | loss scale: 32768.0 | grad norm: 34739.068 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.80 | iteration 16407/ 292968 | consumed samples: 33601536 | consumed tokens: 17208459264 | elapsed time per iteration (ms): 147044.5 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.694505E+00 | loss scale: 32768.0 | grad norm: 39142.455 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.10 | iteration 16408/ 292968 | consumed samples: 33603584 | consumed tokens: 17210441728 | elapsed time per iteration (ms): 148687.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.682827E+00 | loss scale: 32768.0 | grad norm: 29625.883 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.14 | iteration 16409/ 292968 | consumed samples: 33605632 | consumed tokens: 17212424192 | elapsed time per iteration (ms): 148565.5 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.712432E+00 | loss scale: 32768.0 | grad norm: 23329.588 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.21 | iteration 16410/ 292968 | consumed samples: 33607680 | consumed tokens: 17214406656 | elapsed time per iteration (ms): 147332.2 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.699969E+00 | loss scale: 32768.0 | grad norm: 20470.930 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.93 | iteration 16411/ 292968 | consumed samples: 33609728 | consumed tokens: 17216389120 | elapsed time per iteration (ms): 147300.2 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.689471E+00 | loss scale: 32768.0 | grad norm: 30456.806 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.95 | iteration 16412/ 292968 | consumed samples: 33611776 | consumed tokens: 17218371584 | elapsed time per iteration (ms): 147328.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.707765E+00 | loss scale: 32768.0 | grad norm: 38174.722 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.94 | iteration 16413/ 292968 | consumed samples: 33613824 | consumed tokens: 17220354048 | elapsed time per iteration (ms): 147329.6 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.698912E+00 | loss scale: 32768.0 | grad norm: 22052.259 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.93 | iteration 16414/ 292968 | consumed samples: 33615872 | consumed tokens: 17222336512 | elapsed time per iteration (ms): 146999.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.678173E+00 | loss scale: 32768.0 | grad norm: 34302.151 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.13 | iteration 16415/ 292968 | consumed samples: 33617920 | consumed tokens: 17224318976 | elapsed time per iteration (ms): 147649.9 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.695144E+00 | loss scale: 32768.0 | grad norm: 40390.696 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.75 | iteration 16416/ 292968 | consumed samples: 33619968 | consumed tokens: 17226301440 | elapsed time per iteration (ms): 147606.9 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.678233E+00 | loss scale: 32768.0 | grad norm: 22314.470 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.77 | iteration 16417/ 292968 | consumed samples: 33622016 | consumed tokens: 17228283904 | elapsed time per iteration (ms): 148733.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.650995E+00 | loss scale: 32768.0 | grad norm: 30304.161 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.11 | iteration 16418/ 292968 | consumed samples: 33624064 | consumed tokens: 17230266368 | elapsed time per iteration (ms): 147452.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.694137E+00 | loss scale: 32768.0 | grad norm: 32567.292 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.86 | iteration 16419/ 292968 | consumed samples: 33626112 | consumed tokens: 17232248832 | elapsed time per iteration (ms): 147191.9 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.682022E+00 | loss scale: 32768.0 | grad norm: 22188.615 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.02 | iteration 16420/ 292968 | consumed samples: 33628160 | consumed tokens: 17234231296 | elapsed time per iteration (ms): 147353.3 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.692088E+00 | loss scale: 32768.0 | grad norm: 28455.704 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.92 | iteration 16421/ 292968 | consumed samples: 33630208 | consumed tokens: 17236213760 | elapsed time per iteration (ms): 147038.7 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.742752E+00 | loss scale: 32768.0 | grad norm: 43797.228 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16422/ 292968 | consumed samples: 33632256 | consumed tokens: 17238196224 | elapsed time per iteration (ms): 148331.2 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.685593E+00 | loss scale: 32768.0 | grad norm: 33006.325 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.35 | iteration 16423/ 292968 | consumed samples: 33634304 | consumed tokens: 17240178688 | elapsed time per iteration (ms): 147537.4 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.725230E+00 | loss scale: 32768.0 | grad norm: 29612.606 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.81 | iteration 16424/ 292968 | consumed samples: 33636352 | consumed tokens: 17242161152 | elapsed time per iteration (ms): 147126.2 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.709745E+00 | loss scale: 32768.0 | grad norm: 22648.066 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.06 | iteration 16425/ 292968 | consumed samples: 33638400 | consumed tokens: 17244143616 | elapsed time per iteration (ms): 147547.7 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.709115E+00 | loss scale: 32768.0 | grad norm: 25209.422 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.81 | iteration 16426/ 292968 | consumed samples: 33640448 | consumed tokens: 17246126080 | elapsed time per iteration (ms): 147259.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.717385E+00 | loss scale: 32768.0 | grad norm: 34909.857 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.98 | iteration 16427/ 292968 | consumed samples: 33642496 | consumed tokens: 17248108544 | elapsed time per iteration (ms): 147003.1 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.690730E+00 | loss scale: 32768.0 | grad norm: 45398.754 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.13 | iteration 16428/ 292968 | consumed samples: 33644544 | consumed tokens: 17250091008 | elapsed time per iteration (ms): 147293.7 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.701527E+00 | loss scale: 32768.0 | grad norm: 18785.271 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.96 | iteration 16429/ 292968 | consumed samples: 33646592 | consumed tokens: 17252073472 | elapsed time per iteration (ms): 146965.8 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.714359E+00 | loss scale: 32768.0 | grad norm: 47299.193 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.15 | iteration 16430/ 292968 | consumed samples: 33648640 | consumed tokens: 17254055936 | elapsed time per iteration (ms): 148674.5 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.696582E+00 | loss scale: 32768.0 | grad norm: 59156.408 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.15 | iteration 16431/ 292968 | consumed samples: 33650688 | consumed tokens: 17256038400 | elapsed time per iteration (ms): 146989.5 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.758096E+00 | loss scale: 32768.0 | grad norm: 55176.437 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.14 | iteration 16432/ 292968 | consumed samples: 33652736 | consumed tokens: 17258020864 | elapsed time per iteration (ms): 147376.3 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.758008E+00 | loss scale: 32768.0 | grad norm: 48545.038 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.91 | iteration 16433/ 292968 | consumed samples: 33654784 | consumed tokens: 17260003328 | elapsed time per iteration (ms): 147432.0 | learning rate: 5.942E-05 | global batch size: 2048 | lm loss: 2.706497E+00 | loss scale: 32768.0 | grad norm: 45284.101 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16434/ 292968 | consumed samples: 33656832 | consumed tokens: 17261985792 | elapsed time per iteration (ms): 146965.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.725157E+00 | loss scale: 32768.0 | grad norm: 36320.892 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.15 | iteration 16435/ 292968 | consumed samples: 33658880 | consumed tokens: 17263968256 | elapsed time per iteration (ms): 147383.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.712321E+00 | loss scale: 32768.0 | grad norm: 33373.303 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.90 | iteration 16436/ 292968 | consumed samples: 33660928 | consumed tokens: 17265950720 | elapsed time per iteration (ms): 160133.7 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.734398E+00 | loss scale: 32768.0 | grad norm: 35824.045 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 79.98 | iteration 16437/ 292968 | consumed samples: 33662976 | consumed tokens: 17267933184 | elapsed time per iteration (ms): 147018.7 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.700342E+00 | loss scale: 32768.0 | grad norm: 37511.811 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.12 | iteration 16438/ 292968 | consumed samples: 33665024 | consumed tokens: 17269915648 | elapsed time per iteration (ms): 147826.0 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.694262E+00 | loss scale: 32768.0 | grad norm: 26304.101 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.64 | iteration 16439/ 292968 | consumed samples: 33667072 | consumed tokens: 17271898112 | elapsed time per iteration (ms): 147482.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.710235E+00 | loss scale: 32768.0 | grad norm: 30875.305 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.84 | iteration 16440/ 292968 | consumed samples: 33669120 | consumed tokens: 17273880576 | elapsed time per iteration (ms): 147479.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.697204E+00 | loss scale: 32768.0 | grad norm: 47246.920 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.85 | iteration 16441/ 292968 | consumed samples: 33671168 | consumed tokens: 17275863040 | elapsed time per iteration (ms): 147203.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.711139E+00 | loss scale: 32768.0 | grad norm: 28120.246 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.01 | iteration 16442/ 292968 | consumed samples: 33673216 | consumed tokens: 17277845504 | elapsed time per iteration (ms): 147444.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.705093E+00 | loss scale: 32768.0 | grad norm: 35281.323 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16443/ 292968 | consumed samples: 33675264 | consumed tokens: 17279827968 | elapsed time per iteration (ms): 150188.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.723213E+00 | loss scale: 32768.0 | grad norm: 30419.150 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.28 | iteration 16444/ 292968 | consumed samples: 33677312 | consumed tokens: 17281810432 | elapsed time per iteration (ms): 147228.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.711007E+00 | loss scale: 32768.0 | grad norm: 35668.470 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.99 | iteration 16445/ 292968 | consumed samples: 33679360 | consumed tokens: 17283792896 | elapsed time per iteration (ms): 147564.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.708809E+00 | loss scale: 32768.0 | grad norm: 40311.636 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.80 | iteration 16446/ 292968 | consumed samples: 33681408 | consumed tokens: 17285775360 | elapsed time per iteration (ms): 147213.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.691291E+00 | loss scale: 32768.0 | grad norm: 29639.049 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.00 | iteration 16447/ 292968 | consumed samples: 33683456 | consumed tokens: 17287757824 | elapsed time per iteration (ms): 147486.0 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.696319E+00 | loss scale: 32768.0 | grad norm: 31632.303 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.84 | iteration 16448/ 292968 | consumed samples: 33685504 | consumed tokens: 17289740288 | elapsed time per iteration (ms): 149687.3 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.702728E+00 | loss scale: 32768.0 | grad norm: 30686.285 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.57 | iteration 16449/ 292968 | consumed samples: 33687552 | consumed tokens: 17291722752 | elapsed time per iteration (ms): 147466.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.691893E+00 | loss scale: 32768.0 | grad norm: 28717.439 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.85 | iteration 16450/ 292968 | consumed samples: 33689600 | consumed tokens: 17293705216 | elapsed time per iteration (ms): 147275.3 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.686526E+00 | loss scale: 32768.0 | grad norm: 37641.691 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.97 | saving checkpoint at iteration 16450 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 16:35:15,684] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/mp_rank_00_model_states.pt [2022-02-05 16:35:15,825] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/mp_rank_01_model_states.pt [2022-02-05 16:35:35,909] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 16:35:36,591] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 16:35:36,863] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 16:35:37,576] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 16:35:37,858] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 16:35:38,015] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 16:35:38,063] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 16:35:39,003] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 16:35:39,044] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 16:35:39,092] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 16:35:39,157] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 16:35:39,511] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 16:35:40,141] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 16:35:40,255] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 16:35:40,792] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 16:35:41,018] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 16:35:41,580] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 16:35:41,747] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 16:35:41,785] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 16:35:42,169] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 16:35:42,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 16:35:42,240] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 16:35:42,601] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 16:35:42,919] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 16:35:42,988] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 16:35:43,040] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 16:35:43,225] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 16:35:43,586] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 16:35:43,560] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 16:35:43,723] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 16:35:43,788] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 16:35:43,820] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 16:35:43,815] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 16:35:43,886] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 16:35:43,921] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 16:35:44,050] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 16:35:44,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 16:35:44,171] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 16:35:44,147] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 16:35:44,327] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 16:35:44,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 16:35:44,600] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 16:35:44,621] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 16:35:44,943] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 16:35:44,943] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 16:35:45,223] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 16:35:45,465] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 16:35:45,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 16:35:45,510] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 16:35:45,515] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 16:35:45,581] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 16:35:45,680] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 16:35:45,764] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 16:35:45,770] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 16:35:45,826] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 16:35:45,973] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 16:35:45,994] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 16:35:46,056] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 16:35:46,118] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 16:35:46,126] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 16:35:46,276] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 16:35:46,306] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 16:35:46,321] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 16:35:46,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 16:35:46,540] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 16:35:46,554] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 16:35:46,641] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 16:35:46,847] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 16:35:46,853] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 16:35:47,054] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 16:35:47,391] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 16:35:47,486] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 16:35:47,513] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 16:35:47,606] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 16:35:47,627] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 16:35:47,767] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 16:35:47,780] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 16:35:47,897] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 16:35:47,973] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 16:35:48,133] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 16:35:48,183] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 16:35:48,260] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 16:35:48,434] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 16:35:48,470] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 16:35:48,566] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 16:35:48,784] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 16:35:48,810] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 16:35:48,846] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 16:35:48,940] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 16:35:49,045] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 16:35:49,075] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 16:35:49,174] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 16:35:49,198] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 16:35:49,086] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 16:35:49,295] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 16:35:49,362] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 16:35:49,416] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 16:35:49,588] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 16:35:49,603] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 16:35:49,847] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 16:35:49,785] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 16:35:49,962] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 16:35:50,168] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 16:35:50,268] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 16:35:50,322] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 16:35:50,365] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 16:35:50,457] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 16:35:50,471] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 16:35:50,571] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 16:35:50,571] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 16:35:50,670] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 16:35:50,779] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 16:35:50,821] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 16:35:51,767] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 16:35:51,998] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 16:35:55,250] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 16:35:55,706] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 16:35:55,873] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 16:35:56,015] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 16:35:58,556] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 16:35:58,594] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 16:35:59,040] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 16:35:59,052] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 16:35:59,060] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 16:35:59,209] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 16:35:59,473] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 16:36:00,511] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 16:36:00,518] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16450/zero_pp_rank_0_mp_rank_54_optim_states.pt successfully saved checkpoint at iteration 16450 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 49819.93 iteration 16451/ 292968 | consumed samples: 33691648 | consumed tokens: 17295687680 | elapsed time per iteration (ms): 196590.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.686148E+00 | loss scale: 32768.0 | grad norm: 25320.316 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 65.15 | iteration 16452/ 292968 | consumed samples: 33693696 | consumed tokens: 17297670144 | elapsed time per iteration (ms): 146496.6 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.690838E+00 | loss scale: 32768.0 | grad norm: 23449.957 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.43 | iteration 16453/ 292968 | consumed samples: 33695744 | consumed tokens: 17299652608 | elapsed time per iteration (ms): 146677.3 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.683933E+00 | loss scale: 32768.0 | grad norm: 38113.039 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.32 | iteration 16454/ 292968 | consumed samples: 33697792 | consumed tokens: 17301635072 | elapsed time per iteration (ms): 147913.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.685622E+00 | loss scale: 32768.0 | grad norm: 32466.319 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.59 | iteration 16455/ 292968 | consumed samples: 33699840 | consumed tokens: 17303617536 | elapsed time per iteration (ms): 147028.7 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.680632E+00 | loss scale: 32768.0 | grad norm: 37993.568 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16456/ 292968 | consumed samples: 33701888 | consumed tokens: 17305600000 | elapsed time per iteration (ms): 146927.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.715026E+00 | loss scale: 32768.0 | grad norm: 26132.456 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.17 | iteration 16457/ 292968 | consumed samples: 33703936 | consumed tokens: 17307582464 | elapsed time per iteration (ms): 146755.0 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.697108E+00 | loss scale: 32768.0 | grad norm: 25886.609 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.28 | iteration 16458/ 292968 | consumed samples: 33705984 | consumed tokens: 17309564928 | elapsed time per iteration (ms): 146928.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.714430E+00 | loss scale: 32768.0 | grad norm: 30720.614 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.17 | iteration 16459/ 292968 | consumed samples: 33708032 | consumed tokens: 17311547392 | elapsed time per iteration (ms): 147046.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.671201E+00 | loss scale: 32768.0 | grad norm: 33202.156 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.10 | iteration 16460/ 292968 | consumed samples: 33710080 | consumed tokens: 17313529856 | elapsed time per iteration (ms): 147944.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.684351E+00 | loss scale: 32768.0 | grad norm: 33025.379 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.57 | iteration 16461/ 292968 | consumed samples: 33712128 | consumed tokens: 17315512320 | elapsed time per iteration (ms): 146968.3 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.712806E+00 | loss scale: 32768.0 | grad norm: 38762.715 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.15 | iteration 16462/ 292968 | consumed samples: 33714176 | consumed tokens: 17317494784 | elapsed time per iteration (ms): 148302.6 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.683380E+00 | loss scale: 32768.0 | grad norm: 29136.481 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.36 | iteration 16463/ 292968 | consumed samples: 33716224 | consumed tokens: 17319477248 | elapsed time per iteration (ms): 147430.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.697310E+00 | loss scale: 32768.0 | grad norm: 28365.698 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.88 | iteration 16464/ 292968 | consumed samples: 33718272 | consumed tokens: 17321459712 | elapsed time per iteration (ms): 147258.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.671264E+00 | loss scale: 32768.0 | grad norm: 38670.357 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.98 | iteration 16465/ 292968 | consumed samples: 33720320 | consumed tokens: 17323442176 | elapsed time per iteration (ms): 147031.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.666638E+00 | loss scale: 32768.0 | grad norm: 23735.173 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16466/ 292968 | consumed samples: 33722368 | consumed tokens: 17325424640 | elapsed time per iteration (ms): 149568.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.690699E+00 | loss scale: 32768.0 | grad norm: 20141.016 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.63 | iteration 16467/ 292968 | consumed samples: 33724416 | consumed tokens: 17327407104 | elapsed time per iteration (ms): 147269.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.685800E+00 | loss scale: 32768.0 | grad norm: 29141.328 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.97 | iteration 16468/ 292968 | consumed samples: 33726464 | consumed tokens: 17329389568 | elapsed time per iteration (ms): 146879.7 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.703945E+00 | loss scale: 32768.0 | grad norm: 31748.151 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.20 | iteration 16469/ 292968 | consumed samples: 33728512 | consumed tokens: 17331372032 | elapsed time per iteration (ms): 146948.5 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.674914E+00 | loss scale: 32768.0 | grad norm: 34627.421 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.16 | iteration 16470/ 292968 | consumed samples: 33730560 | consumed tokens: 17333354496 | elapsed time per iteration (ms): 147479.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.679406E+00 | loss scale: 32768.0 | grad norm: 28719.833 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.85 | iteration 16471/ 292968 | consumed samples: 33732608 | consumed tokens: 17335336960 | elapsed time per iteration (ms): 147522.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.690949E+00 | loss scale: 32768.0 | grad norm: 29935.487 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.82 | iteration 16472/ 292968 | consumed samples: 33734656 | consumed tokens: 17337319424 | elapsed time per iteration (ms): 147312.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.708275E+00 | loss scale: 32768.0 | grad norm: 30302.591 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.95 | iteration 16473/ 292968 | consumed samples: 33736704 | consumed tokens: 17339301888 | elapsed time per iteration (ms): 146915.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.708289E+00 | loss scale: 32768.0 | grad norm: 28459.309 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.18 | iteration 16474/ 292968 | consumed samples: 33738752 | consumed tokens: 17341284352 | elapsed time per iteration (ms): 147265.6 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.688894E+00 | loss scale: 32768.0 | grad norm: 24161.486 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.97 | iteration 16475/ 292968 | consumed samples: 33740800 | consumed tokens: 17343266816 | elapsed time per iteration (ms): 147355.7 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.660993E+00 | loss scale: 32768.0 | grad norm: 28983.629 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.92 | iteration 16476/ 292968 | consumed samples: 33742848 | consumed tokens: 17345249280 | elapsed time per iteration (ms): 147698.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.665717E+00 | loss scale: 32768.0 | grad norm: 30156.152 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.72 | iteration 16477/ 292968 | consumed samples: 33744896 | consumed tokens: 17347231744 | elapsed time per iteration (ms): 148545.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.683379E+00 | loss scale: 32768.0 | grad norm: 26980.403 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.22 | iteration 16478/ 292968 | consumed samples: 33746944 | consumed tokens: 17349214208 | elapsed time per iteration (ms): 147229.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.711564E+00 | loss scale: 32768.0 | grad norm: 29803.332 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.99 | iteration 16479/ 292968 | consumed samples: 33748992 | consumed tokens: 17351196672 | elapsed time per iteration (ms): 147405.5 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.663066E+00 | loss scale: 32768.0 | grad norm: 28968.613 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.89 | iteration 16480/ 292968 | consumed samples: 33751040 | consumed tokens: 17353179136 | elapsed time per iteration (ms): 147683.3 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.703747E+00 | loss scale: 32768.0 | grad norm: 27507.602 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.73 | iteration 16481/ 292968 | consumed samples: 33753088 | consumed tokens: 17355161600 | elapsed time per iteration (ms): 147028.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.662470E+00 | loss scale: 32768.0 | grad norm: 23847.852 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16482/ 292968 | consumed samples: 33755136 | consumed tokens: 17357144064 | elapsed time per iteration (ms): 149155.1 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.683861E+00 | loss scale: 32768.0 | grad norm: 34627.422 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.87 | iteration 16483/ 292968 | consumed samples: 33757184 | consumed tokens: 17359126528 | elapsed time per iteration (ms): 147493.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.694278E+00 | loss scale: 32768.0 | grad norm: 46248.661 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.84 | iteration 16484/ 292968 | consumed samples: 33759232 | consumed tokens: 17361108992 | elapsed time per iteration (ms): 147416.3 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.689365E+00 | loss scale: 32768.0 | grad norm: 20870.504 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.88 | iteration 16485/ 292968 | consumed samples: 33761280 | consumed tokens: 17363091456 | elapsed time per iteration (ms): 147430.6 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.711801E+00 | loss scale: 32768.0 | grad norm: 66342.984 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.88 | iteration 16486/ 292968 | consumed samples: 33763328 | consumed tokens: 17365073920 | elapsed time per iteration (ms): 147753.7 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.693613E+00 | loss scale: 32768.0 | grad norm: 20960.629 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.69 | iteration 16487/ 292968 | consumed samples: 33765376 | consumed tokens: 17367056384 | elapsed time per iteration (ms): 147364.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.696064E+00 | loss scale: 32768.0 | grad norm: 52465.610 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.91 | iteration 16488/ 292968 | consumed samples: 33767424 | consumed tokens: 17369038848 | elapsed time per iteration (ms): 149226.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.686584E+00 | loss scale: 32768.0 | grad norm: 28185.499 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.83 | iteration 16489/ 292968 | consumed samples: 33769472 | consumed tokens: 17371021312 | elapsed time per iteration (ms): 147499.0 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.687126E+00 | loss scale: 32768.0 | grad norm: 56702.056 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.84 | iteration 16490/ 292968 | consumed samples: 33771520 | consumed tokens: 17373003776 | elapsed time per iteration (ms): 147406.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.691689E+00 | loss scale: 32768.0 | grad norm: 33046.955 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.89 | iteration 16491/ 292968 | consumed samples: 33773568 | consumed tokens: 17374986240 | elapsed time per iteration (ms): 146892.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.689749E+00 | loss scale: 32768.0 | grad norm: 34953.015 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.19 | iteration 16492/ 292968 | consumed samples: 33775616 | consumed tokens: 17376968704 | elapsed time per iteration (ms): 147545.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.695087E+00 | loss scale: 32768.0 | grad norm: 41812.371 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.81 | iteration 16493/ 292968 | consumed samples: 33777664 | consumed tokens: 17378951168 | elapsed time per iteration (ms): 148114.5 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.688991E+00 | loss scale: 32768.0 | grad norm: 25667.038 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.47 | iteration 16494/ 292968 | consumed samples: 33779712 | consumed tokens: 17380933632 | elapsed time per iteration (ms): 147233.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.676681E+00 | loss scale: 32768.0 | grad norm: 42499.482 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.99 | iteration 16495/ 292968 | consumed samples: 33781760 | consumed tokens: 17382916096 | elapsed time per iteration (ms): 146949.0 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.687480E+00 | loss scale: 32768.0 | grad norm: 26507.737 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.16 | iteration 16496/ 292968 | consumed samples: 33783808 | consumed tokens: 17384898560 | elapsed time per iteration (ms): 147235.5 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.669065E+00 | loss scale: 32768.0 | grad norm: 29326.420 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.99 | iteration 16497/ 292968 | consumed samples: 33785856 | consumed tokens: 17386881024 | elapsed time per iteration (ms): 147677.8 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.680405E+00 | loss scale: 32768.0 | grad norm: 31356.332 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.73 | iteration 16498/ 292968 | consumed samples: 33787904 | consumed tokens: 17388863488 | elapsed time per iteration (ms): 147133.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.695646E+00 | loss scale: 32768.0 | grad norm: 25369.685 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.05 | iteration 16499/ 292968 | consumed samples: 33789952 | consumed tokens: 17390845952 | elapsed time per iteration (ms): 149158.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.665529E+00 | loss scale: 32768.0 | grad norm: 25761.684 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.87 | iteration 16500/ 292968 | consumed samples: 33792000 | consumed tokens: 17392828416 | elapsed time per iteration (ms): 147325.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.698261E+00 | loss scale: 32768.0 | grad norm: 29762.940 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.94 | ------------------------------------------------------------------------------------------- valid loss at iteration 16500 | lm loss value: 3.101636E+00 | lm loss PPL: 2.223431E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 18:46:44,836] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/mp_rank_00_model_states.pt [2022-02-05 18:46:44,899] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/mp_rank_01_model_states.pt [2022-02-05 18:47:52,291] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 18:47:52,308] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 18:47:52,558] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 18:47:52,947] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 18:47:52,995] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 18:47:53,302] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 18:47:54,084] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 18:47:54,194] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 18:47:54,399] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 18:47:54,444] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 18:47:54,455] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 18:47:54,568] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 18:47:54,840] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 18:47:55,304] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 18:47:55,378] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 18:47:55,511] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 18:47:56,322] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 18:47:56,613] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 18:47:56,613] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 18:47:56,858] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 18:47:57,080] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 18:47:57,207] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 18:47:57,271] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 18:47:57,497] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 18:47:57,620] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 18:47:57,843] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 18:47:57,874] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 18:47:57,974] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 18:47:57,976] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 18:47:57,998] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 18:47:58,333] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 18:47:58,376] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 18:47:58,496] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 18:47:58,690] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 18:47:58,781] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 18:47:58,786] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 18:47:58,974] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 18:47:59,034] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 18:47:59,718] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 18:47:59,936] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 18:48:00,016] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 18:48:00,044] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 18:48:00,169] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 18:48:00,218] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 18:48:00,225] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 18:48:00,231] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 18:48:00,281] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 18:48:00,408] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 18:48:00,447] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 18:48:00,786] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 18:48:00,833] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 18:48:01,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 18:48:01,789] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 18:48:02,107] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 18:48:02,142] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 18:48:02,216] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 18:48:02,566] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 18:48:03,066] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 18:48:03,182] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 18:48:03,414] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 18:48:03,976] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 18:48:04,501] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 18:48:04,332] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 18:48:04,997] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 18:48:05,358] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 18:48:05,643] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 18:48:06,163] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 18:48:06,031] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 18:48:06,031] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 18:48:06,071] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 18:48:06,075] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 18:48:06,104] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 18:48:06,618] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 18:48:06,787] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 18:48:06,818] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 18:48:06,957] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 18:48:07,015] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 18:48:07,066] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 18:48:07,098] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 18:48:07,108] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 18:48:07,181] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 18:48:07,187] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 18:48:07,461] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 18:48:07,930] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 18:48:08,230] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 18:48:08,246] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 18:48:08,554] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 18:48:08,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 18:48:08,802] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 18:48:08,810] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 18:48:08,874] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 18:48:08,931] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 18:48:09,016] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 18:48:09,042] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 18:48:09,090] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 18:48:09,112] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 18:48:09,246] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 18:48:09,545] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 18:48:09,627] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 18:48:10,329] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 18:48:10,530] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 18:48:10,585] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 18:48:10,678] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 18:48:10,747] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 18:48:10,749] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 18:48:10,827] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 18:48:11,887] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 18:48:12,020] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 18:48:14,927] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 18:48:15,078] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 18:48:15,146] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 18:48:15,203] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 18:48:16,474] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 18:48:16,533] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 18:48:18,416] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 18:48:18,443] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 18:48:19,468] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 18:48:19,520] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 18:48:20,413] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 18:48:20,494] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 18:48:20,773] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 18:48:21,055] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 18:48:21,865] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 18:48:22,115] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 18:48:22,266] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 18:48:22,270] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 18:48:23,691] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 18:48:23,758] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16500/zero_pp_rank_0_mp_rank_44_optim_states.pt successfully saved checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 114350.39 iteration 16501/ 292968 | consumed samples: 33794048 | consumed tokens: 17394810880 | elapsed time per iteration (ms): 720711.9 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.692571E+00 | loss scale: 32768.0 | grad norm: 42418.947 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 17.77 | iteration 16502/ 292968 | consumed samples: 33796096 | consumed tokens: 17396793344 | elapsed time per iteration (ms): 147738.3 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.671998E+00 | loss scale: 32768.0 | grad norm: 19632.038 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.69 | iteration 16503/ 292968 | consumed samples: 33798144 | consumed tokens: 17398775808 | elapsed time per iteration (ms): 148774.2 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.691600E+00 | loss scale: 32768.0 | grad norm: 40637.362 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.09 | iteration 16504/ 292968 | consumed samples: 33800192 | consumed tokens: 17400758272 | elapsed time per iteration (ms): 147431.4 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.667098E+00 | loss scale: 32768.0 | grad norm: 34590.401 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16505/ 292968 | consumed samples: 33802240 | consumed tokens: 17402740736 | elapsed time per iteration (ms): 147257.0 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.684803E+00 | loss scale: 32768.0 | grad norm: 20811.691 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.98 | iteration 16506/ 292968 | consumed samples: 33804288 | consumed tokens: 17404723200 | elapsed time per iteration (ms): 147730.0 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.688989E+00 | loss scale: 32768.0 | grad norm: 34480.147 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.70 | iteration 16507/ 292968 | consumed samples: 33806336 | consumed tokens: 17406705664 | elapsed time per iteration (ms): 147386.7 | learning rate: 5.941E-05 | global batch size: 2048 | lm loss: 2.664320E+00 | loss scale: 32768.0 | grad norm: 38378.202 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.90 | iteration 16508/ 292968 | consumed samples: 33808384 | consumed tokens: 17408688128 | elapsed time per iteration (ms): 149460.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.677592E+00 | loss scale: 32768.0 | grad norm: 26100.386 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.70 | iteration 16509/ 292968 | consumed samples: 33810432 | consumed tokens: 17410670592 | elapsed time per iteration (ms): 147690.9 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.706434E+00 | loss scale: 32768.0 | grad norm: 37210.401 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.72 | iteration 16510/ 292968 | consumed samples: 33812480 | consumed tokens: 17412653056 | elapsed time per iteration (ms): 147831.1 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.686464E+00 | loss scale: 32768.0 | grad norm: 31754.504 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.64 | iteration 16511/ 292968 | consumed samples: 33814528 | consumed tokens: 17414635520 | elapsed time per iteration (ms): 148567.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.682412E+00 | loss scale: 32768.0 | grad norm: 28641.701 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.21 | iteration 16512/ 292968 | consumed samples: 33816576 | consumed tokens: 17416617984 | elapsed time per iteration (ms): 147414.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.700256E+00 | loss scale: 32768.0 | grad norm: 29117.506 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.88 | iteration 16513/ 292968 | consumed samples: 33818624 | consumed tokens: 17418600448 | elapsed time per iteration (ms): 147736.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.660530E+00 | loss scale: 32768.0 | grad norm: 38187.861 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.70 | iteration 16514/ 292968 | consumed samples: 33820672 | consumed tokens: 17420582912 | elapsed time per iteration (ms): 147985.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.669160E+00 | loss scale: 32768.0 | grad norm: 29839.590 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.55 | iteration 16515/ 292968 | consumed samples: 33822720 | consumed tokens: 17422565376 | elapsed time per iteration (ms): 149472.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.704391E+00 | loss scale: 32768.0 | grad norm: 28783.296 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.69 | iteration 16516/ 292968 | consumed samples: 33824768 | consumed tokens: 17424547840 | elapsed time per iteration (ms): 148329.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.695049E+00 | loss scale: 32768.0 | grad norm: 38846.473 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.35 | iteration 16517/ 292968 | consumed samples: 33826816 | consumed tokens: 17426530304 | elapsed time per iteration (ms): 147365.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.660226E+00 | loss scale: 32768.0 | grad norm: 30622.447 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.91 | iteration 16518/ 292968 | consumed samples: 33828864 | consumed tokens: 17428512768 | elapsed time per iteration (ms): 147631.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.703196E+00 | loss scale: 32768.0 | grad norm: 26739.837 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.76 | iteration 16519/ 292968 | consumed samples: 33830912 | consumed tokens: 17430495232 | elapsed time per iteration (ms): 147672.1 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.686827E+00 | loss scale: 32768.0 | grad norm: 25475.141 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.73 | iteration 16520/ 292968 | consumed samples: 33832960 | consumed tokens: 17432477696 | elapsed time per iteration (ms): 151743.6 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.683736E+00 | loss scale: 32768.0 | grad norm: 29059.232 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.41 | iteration 16521/ 292968 | consumed samples: 33835008 | consumed tokens: 17434460160 | elapsed time per iteration (ms): 148645.6 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.666845E+00 | loss scale: 32768.0 | grad norm: 38020.157 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.17 | iteration 16522/ 292968 | consumed samples: 33837056 | consumed tokens: 17436442624 | elapsed time per iteration (ms): 147462.7 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.678149E+00 | loss scale: 32768.0 | grad norm: 42214.472 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.86 | iteration 16523/ 292968 | consumed samples: 33839104 | consumed tokens: 17438425088 | elapsed time per iteration (ms): 147531.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.693798E+00 | loss scale: 32768.0 | grad norm: 20321.649 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.82 | iteration 16524/ 292968 | consumed samples: 33841152 | consumed tokens: 17440407552 | elapsed time per iteration (ms): 147370.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.674758E+00 | loss scale: 32768.0 | grad norm: 26879.330 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.91 | iteration 16525/ 292968 | consumed samples: 33843200 | consumed tokens: 17442390016 | elapsed time per iteration (ms): 147301.6 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.686138E+00 | loss scale: 32768.0 | grad norm: 38837.340 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.95 | iteration 16526/ 292968 | consumed samples: 33845248 | consumed tokens: 17444372480 | elapsed time per iteration (ms): 147529.6 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.702304E+00 | loss scale: 32768.0 | grad norm: 28724.083 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.82 | iteration 16527/ 292968 | consumed samples: 33847296 | consumed tokens: 17446354944 | elapsed time per iteration (ms): 148814.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.651311E+00 | loss scale: 32768.0 | grad norm: 36724.951 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.07 | iteration 16528/ 292968 | consumed samples: 33849344 | consumed tokens: 17448337408 | elapsed time per iteration (ms): 147874.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.674902E+00 | loss scale: 32768.0 | grad norm: 26794.556 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.61 | iteration 16529/ 292968 | consumed samples: 33851392 | consumed tokens: 17450319872 | elapsed time per iteration (ms): 148468.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.689453E+00 | loss scale: 32768.0 | grad norm: 39121.395 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.27 | iteration 16530/ 292968 | consumed samples: 33853440 | consumed tokens: 17452302336 | elapsed time per iteration (ms): 147817.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.692978E+00 | loss scale: 32768.0 | grad norm: 49860.620 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.65 | iteration 16531/ 292968 | consumed samples: 33855488 | consumed tokens: 17454284800 | elapsed time per iteration (ms): 147884.0 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.682027E+00 | loss scale: 32768.0 | grad norm: 27661.658 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.61 | iteration 16532/ 292968 | consumed samples: 33857536 | consumed tokens: 17456267264 | elapsed time per iteration (ms): 147585.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.692860E+00 | loss scale: 32768.0 | grad norm: 85084.577 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.78 | iteration 16533/ 292968 | consumed samples: 33859584 | consumed tokens: 17458249728 | elapsed time per iteration (ms): 147426.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.702213E+00 | loss scale: 32768.0 | grad norm: 55246.082 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.88 | iteration 16534/ 292968 | consumed samples: 33861632 | consumed tokens: 17460232192 | elapsed time per iteration (ms): 156655.0 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.685696E+00 | loss scale: 32768.0 | grad norm: 56413.618 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.76 | iteration 16535/ 292968 | consumed samples: 33863680 | consumed tokens: 17462214656 | elapsed time per iteration (ms): 147501.9 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.684005E+00 | loss scale: 32768.0 | grad norm: 53182.183 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.83 | iteration 16536/ 292968 | consumed samples: 33865728 | consumed tokens: 17464197120 | elapsed time per iteration (ms): 146848.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.686410E+00 | loss scale: 32768.0 | grad norm: 38330.396 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.22 | iteration 16537/ 292968 | consumed samples: 33867776 | consumed tokens: 17466179584 | elapsed time per iteration (ms): 147111.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.689771E+00 | loss scale: 32768.0 | grad norm: 27965.911 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.06 | iteration 16538/ 292968 | consumed samples: 33869824 | consumed tokens: 17468162048 | elapsed time per iteration (ms): 147445.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.686727E+00 | loss scale: 32768.0 | grad norm: 36381.559 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16539/ 292968 | consumed samples: 33871872 | consumed tokens: 17470144512 | elapsed time per iteration (ms): 147215.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.708779E+00 | loss scale: 32768.0 | grad norm: 30300.774 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.00 | iteration 16540/ 292968 | consumed samples: 33873920 | consumed tokens: 17472126976 | elapsed time per iteration (ms): 149932.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.700877E+00 | loss scale: 32768.0 | grad norm: 36710.675 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.43 | iteration 16541/ 292968 | consumed samples: 33875968 | consumed tokens: 17474109440 | elapsed time per iteration (ms): 147676.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.705829E+00 | loss scale: 32768.0 | grad norm: 33963.428 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.73 | iteration 16542/ 292968 | consumed samples: 33878016 | consumed tokens: 17476091904 | elapsed time per iteration (ms): 147441.6 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.681992E+00 | loss scale: 32768.0 | grad norm: 44198.306 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.87 | iteration 16543/ 292968 | consumed samples: 33880064 | consumed tokens: 17478074368 | elapsed time per iteration (ms): 147468.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.698549E+00 | loss scale: 32768.0 | grad norm: 20820.664 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.85 | iteration 16544/ 292968 | consumed samples: 33882112 | consumed tokens: 17480056832 | elapsed time per iteration (ms): 147337.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.693924E+00 | loss scale: 32768.0 | grad norm: 28361.352 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.93 | iteration 16545/ 292968 | consumed samples: 33884160 | consumed tokens: 17482039296 | elapsed time per iteration (ms): 147035.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.699790E+00 | loss scale: 32768.0 | grad norm: 32883.582 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16546/ 292968 | consumed samples: 33886208 | consumed tokens: 17484021760 | elapsed time per iteration (ms): 148557.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.698538E+00 | loss scale: 32768.0 | grad norm: 40082.686 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.22 | iteration 16547/ 292968 | consumed samples: 33888256 | consumed tokens: 17486004224 | elapsed time per iteration (ms): 147792.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.676121E+00 | loss scale: 32768.0 | grad norm: 25663.200 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.66 | iteration 16548/ 292968 | consumed samples: 33890304 | consumed tokens: 17487986688 | elapsed time per iteration (ms): 150771.9 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.683575E+00 | loss scale: 32768.0 | grad norm: 32789.687 | num zeros: 0.0 | curriculum seqlen: 968 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 84.95 | iteration 16549/ 292968 | consumed samples: 33892352 | consumed tokens: 17489985536 | elapsed time per iteration (ms): 147056.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.699734E+00 | loss scale: 32768.0 | grad norm: 47450.176 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.82 | iteration 16550/ 292968 | consumed samples: 33894400 | consumed tokens: 17491984384 | elapsed time per iteration (ms): 147439.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.691302E+00 | loss scale: 32768.0 | grad norm: 25711.041 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.59 | saving checkpoint at iteration 16550 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 20:51:57,742] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/mp_rank_01_model_states.pt [2022-02-05 20:51:58,013] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/mp_rank_00_model_states.pt [2022-02-05 20:52:19,683] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 20:52:19,849] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 20:52:19,891] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 20:52:20,093] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 20:52:20,196] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 20:52:20,749] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 20:52:20,859] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 20:52:21,350] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 20:52:21,444] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 20:52:21,619] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 20:52:21,665] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 20:52:21,898] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 20:52:21,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 20:52:22,216] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 20:52:22,422] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 20:52:22,572] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 20:52:22,591] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 20:52:22,603] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 20:52:22,605] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 20:52:22,565] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 20:52:22,726] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 20:52:22,845] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 20:52:23,017] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 20:52:23,018] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 20:52:23,061] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 20:52:23,125] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 20:52:23,376] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 20:52:23,392] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 20:52:23,512] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 20:52:23,568] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 20:52:23,580] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 20:52:23,611] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 20:52:23,644] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 20:52:23,760] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 20:52:23,875] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 20:52:23,866] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 20:52:24,063] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 20:52:24,198] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 20:52:24,540] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 20:52:24,778] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 20:52:25,014] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 20:52:25,350] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 20:52:25,374] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 20:52:25,407] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 20:52:25,654] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 20:52:25,692] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 20:52:25,633] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 20:52:25,949] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 20:52:26,108] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 20:52:26,298] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 20:52:26,334] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 20:52:26,457] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 20:52:26,477] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 20:52:26,485] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 20:52:26,500] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 20:52:26,502] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 20:52:26,577] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 20:52:26,506] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 20:52:27,267] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 20:52:27,363] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 20:52:27,477] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 20:52:27,612] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 20:52:27,860] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 20:52:27,943] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 20:52:28,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 20:52:28,738] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 20:52:28,802] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 20:52:28,862] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 20:52:29,534] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 20:52:29,621] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 20:52:30,005] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 20:52:30,121] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 20:52:30,564] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 20:52:30,674] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 20:52:30,835] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 20:52:31,013] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 20:52:31,046] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 20:52:31,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 20:52:31,192] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 20:52:31,248] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 20:52:31,322] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 20:52:31,266] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 20:52:31,294] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 20:52:31,587] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 20:52:31,475] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 20:52:31,835] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 20:52:32,338] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 20:52:32,359] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 20:52:32,397] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 20:52:32,492] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 20:52:32,883] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 20:52:33,136] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 20:52:33,178] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 20:52:33,192] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 20:52:33,700] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 20:52:33,871] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 20:52:33,881] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 20:52:33,915] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 20:52:34,075] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 20:52:34,143] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 20:52:34,666] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 20:52:35,171] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 20:52:35,244] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 20:52:35,259] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 20:52:35,652] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 20:52:35,754] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 20:52:35,811] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 20:52:36,523] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 20:52:36,543] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 20:52:37,884] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 20:52:38,186] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 20:52:38,304] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 20:52:39,067] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 20:52:39,170] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 20:52:39,320] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 20:52:40,355] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 20:52:40,830] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 20:52:40,919] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 20:52:42,143] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-05 20:52:42,163] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 20:52:42,300] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 20:52:42,318] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 20:52:43,573] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 20:52:43,647] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 20:52:43,842] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 20:52:43,845] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 20:52:43,928] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 20:52:43,994] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16550/zero_pp_rank_0_mp_rank_75_optim_states.pt successfully saved checkpoint at iteration 16550 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 50950.83 iteration 16551/ 292968 | consumed samples: 33896448 | consumed tokens: 17493983232 | elapsed time per iteration (ms): 197999.9 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.672669E+00 | loss scale: 32768.0 | grad norm: 41927.050 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 65.22 | iteration 16552/ 292968 | consumed samples: 33898496 | consumed tokens: 17495982080 | elapsed time per iteration (ms): 147012.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.701799E+00 | loss scale: 32768.0 | grad norm: 27305.851 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.84 | iteration 16553/ 292968 | consumed samples: 33900544 | consumed tokens: 17497980928 | elapsed time per iteration (ms): 148819.0 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.678376E+00 | loss scale: 32768.0 | grad norm: 23802.510 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.78 | iteration 16554/ 292968 | consumed samples: 33902592 | consumed tokens: 17499979776 | elapsed time per iteration (ms): 146693.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.689219E+00 | loss scale: 32768.0 | grad norm: 32005.725 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.03 | iteration 16555/ 292968 | consumed samples: 33904640 | consumed tokens: 17501978624 | elapsed time per iteration (ms): 146849.0 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.691136E+00 | loss scale: 32768.0 | grad norm: 45519.811 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.94 | iteration 16556/ 292968 | consumed samples: 33906688 | consumed tokens: 17503977472 | elapsed time per iteration (ms): 147843.9 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.702173E+00 | loss scale: 32768.0 | grad norm: 22424.150 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.35 | iteration 16557/ 292968 | consumed samples: 33908736 | consumed tokens: 17505976320 | elapsed time per iteration (ms): 147164.7 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.698149E+00 | loss scale: 32768.0 | grad norm: 32364.069 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.75 | iteration 16558/ 292968 | consumed samples: 33910784 | consumed tokens: 17507975168 | elapsed time per iteration (ms): 148153.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.684976E+00 | loss scale: 32768.0 | grad norm: 35024.381 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.17 | iteration 16559/ 292968 | consumed samples: 33912832 | consumed tokens: 17509974016 | elapsed time per iteration (ms): 147166.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.712494E+00 | loss scale: 32768.0 | grad norm: 29958.724 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.75 | iteration 16560/ 292968 | consumed samples: 33914880 | consumed tokens: 17511972864 | elapsed time per iteration (ms): 148619.0 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.679486E+00 | loss scale: 32768.0 | grad norm: 40440.750 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.89 | iteration 16561/ 292968 | consumed samples: 33916928 | consumed tokens: 17513971712 | elapsed time per iteration (ms): 146972.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.671917E+00 | loss scale: 32768.0 | grad norm: 22168.082 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.87 | iteration 16562/ 292968 | consumed samples: 33918976 | consumed tokens: 17515970560 | elapsed time per iteration (ms): 147103.1 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.692535E+00 | loss scale: 32768.0 | grad norm: 31135.337 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.79 | iteration 16563/ 292968 | consumed samples: 33921024 | consumed tokens: 17517969408 | elapsed time per iteration (ms): 147046.4 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.674490E+00 | loss scale: 32768.0 | grad norm: 30839.174 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.82 | iteration 16564/ 292968 | consumed samples: 33923072 | consumed tokens: 17519968256 | elapsed time per iteration (ms): 147276.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.694007E+00 | loss scale: 32768.0 | grad norm: 36478.788 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.69 | iteration 16565/ 292968 | consumed samples: 33925120 | consumed tokens: 17521967104 | elapsed time per iteration (ms): 148281.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.685065E+00 | loss scale: 32768.0 | grad norm: 29511.726 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.09 | iteration 16566/ 292968 | consumed samples: 33927168 | consumed tokens: 17523965952 | elapsed time per iteration (ms): 147163.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.676938E+00 | loss scale: 32768.0 | grad norm: 26533.525 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.75 | iteration 16567/ 292968 | consumed samples: 33929216 | consumed tokens: 17525964800 | elapsed time per iteration (ms): 147091.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.682660E+00 | loss scale: 32768.0 | grad norm: 26193.737 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.80 | iteration 16568/ 292968 | consumed samples: 33931264 | consumed tokens: 17527963648 | elapsed time per iteration (ms): 148847.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.683111E+00 | loss scale: 32768.0 | grad norm: 27883.540 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.76 | iteration 16569/ 292968 | consumed samples: 33933312 | consumed tokens: 17529962496 | elapsed time per iteration (ms): 147118.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.689960E+00 | loss scale: 32768.0 | grad norm: 33820.817 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.78 | iteration 16570/ 292968 | consumed samples: 33935360 | consumed tokens: 17531961344 | elapsed time per iteration (ms): 147857.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.694718E+00 | loss scale: 32768.0 | grad norm: 43301.407 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.34 | iteration 16571/ 292968 | consumed samples: 33937408 | consumed tokens: 17533960192 | elapsed time per iteration (ms): 147097.0 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.706248E+00 | loss scale: 32768.0 | grad norm: 24170.636 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.79 | iteration 16572/ 292968 | consumed samples: 33939456 | consumed tokens: 17535959040 | elapsed time per iteration (ms): 146639.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.701426E+00 | loss scale: 32768.0 | grad norm: 34836.127 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.07 | iteration 16573/ 292968 | consumed samples: 33941504 | consumed tokens: 17537957888 | elapsed time per iteration (ms): 146630.2 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.693662E+00 | loss scale: 32768.0 | grad norm: 31329.944 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.07 | iteration 16574/ 292968 | consumed samples: 33943552 | consumed tokens: 17539956736 | elapsed time per iteration (ms): 146721.7 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.703522E+00 | loss scale: 32768.0 | grad norm: 25926.461 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.02 | iteration 16575/ 292968 | consumed samples: 33945600 | consumed tokens: 17541955584 | elapsed time per iteration (ms): 146699.3 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.680429E+00 | loss scale: 32768.0 | grad norm: 39157.163 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.03 | iteration 16576/ 292968 | consumed samples: 33947648 | consumed tokens: 17543954432 | elapsed time per iteration (ms): 148357.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.694868E+00 | loss scale: 32768.0 | grad norm: 26921.217 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.05 | iteration 16577/ 292968 | consumed samples: 33949696 | consumed tokens: 17545953280 | elapsed time per iteration (ms): 147856.1 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.662066E+00 | loss scale: 32768.0 | grad norm: 27845.630 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.34 | iteration 16578/ 292968 | consumed samples: 33951744 | consumed tokens: 17547952128 | elapsed time per iteration (ms): 146928.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.705008E+00 | loss scale: 32768.0 | grad norm: 33163.762 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.89 | iteration 16579/ 292968 | consumed samples: 33953792 | consumed tokens: 17549950976 | elapsed time per iteration (ms): 147138.6 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.714927E+00 | loss scale: 32768.0 | grad norm: 29047.443 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.77 | iteration 16580/ 292968 | consumed samples: 33955840 | consumed tokens: 17551949824 | elapsed time per iteration (ms): 146854.8 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.695311E+00 | loss scale: 32768.0 | grad norm: 42775.591 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.94 | iteration 16581/ 292968 | consumed samples: 33957888 | consumed tokens: 17553948672 | elapsed time per iteration (ms): 146958.5 | learning rate: 5.940E-05 | global batch size: 2048 | lm loss: 2.697895E+00 | loss scale: 32768.0 | grad norm: 23737.309 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.87 | iteration 16582/ 292968 | consumed samples: 33959936 | consumed tokens: 17555947520 | elapsed time per iteration (ms): 146875.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.697621E+00 | loss scale: 32768.0 | grad norm: 27046.235 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.92 | iteration 16583/ 292968 | consumed samples: 33961984 | consumed tokens: 17557946368 | elapsed time per iteration (ms): 147749.1 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.691438E+00 | loss scale: 32768.0 | grad norm: 34556.437 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.40 | iteration 16584/ 292968 | consumed samples: 33964032 | consumed tokens: 17559945216 | elapsed time per iteration (ms): 146815.9 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.698572E+00 | loss scale: 32768.0 | grad norm: 29056.362 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.96 | iteration 16585/ 292968 | consumed samples: 33966080 | consumed tokens: 17561944064 | elapsed time per iteration (ms): 146663.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.712933E+00 | loss scale: 32768.0 | grad norm: 34993.064 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.05 | iteration 16586/ 292968 | consumed samples: 33968128 | consumed tokens: 17563942912 | elapsed time per iteration (ms): 146437.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.683017E+00 | loss scale: 32768.0 | grad norm: 34334.813 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.19 | iteration 16587/ 292968 | consumed samples: 33970176 | consumed tokens: 17565941760 | elapsed time per iteration (ms): 146509.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.672415E+00 | loss scale: 32768.0 | grad norm: 36493.784 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.14 | iteration 16588/ 292968 | consumed samples: 33972224 | consumed tokens: 17567940608 | elapsed time per iteration (ms): 146740.9 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.692956E+00 | loss scale: 32768.0 | grad norm: 35153.664 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.01 | iteration 16589/ 292968 | consumed samples: 33974272 | consumed tokens: 17569939456 | elapsed time per iteration (ms): 147301.9 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.694528E+00 | loss scale: 32768.0 | grad norm: 30362.185 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.67 | iteration 16590/ 292968 | consumed samples: 33976320 | consumed tokens: 17571938304 | elapsed time per iteration (ms): 147177.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.689544E+00 | loss scale: 32768.0 | grad norm: 36001.429 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.74 | iteration 16591/ 292968 | consumed samples: 33978368 | consumed tokens: 17573937152 | elapsed time per iteration (ms): 147309.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.688842E+00 | loss scale: 32768.0 | grad norm: 40218.196 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.67 | iteration 16592/ 292968 | consumed samples: 33980416 | consumed tokens: 17575936000 | elapsed time per iteration (ms): 146986.1 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.682816E+00 | loss scale: 32768.0 | grad norm: 31639.681 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.86 | iteration 16593/ 292968 | consumed samples: 33982464 | consumed tokens: 17577934848 | elapsed time per iteration (ms): 146927.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.686195E+00 | loss scale: 32768.0 | grad norm: 31279.959 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.89 | iteration 16594/ 292968 | consumed samples: 33984512 | consumed tokens: 17579933696 | elapsed time per iteration (ms): 147024.9 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.692513E+00 | loss scale: 32768.0 | grad norm: 36643.227 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.84 | iteration 16595/ 292968 | consumed samples: 33986560 | consumed tokens: 17581932544 | elapsed time per iteration (ms): 146526.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.673665E+00 | loss scale: 32768.0 | grad norm: 31982.557 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.13 | iteration 16596/ 292968 | consumed samples: 33988608 | consumed tokens: 17583931392 | elapsed time per iteration (ms): 146757.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.700625E+00 | loss scale: 32768.0 | grad norm: 38554.422 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.99 | iteration 16597/ 292968 | consumed samples: 33990656 | consumed tokens: 17585930240 | elapsed time per iteration (ms): 146469.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.700515E+00 | loss scale: 32768.0 | grad norm: 22330.704 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.17 | iteration 16598/ 292968 | consumed samples: 33992704 | consumed tokens: 17587929088 | elapsed time per iteration (ms): 147636.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.707489E+00 | loss scale: 32768.0 | grad norm: 31980.767 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.47 | iteration 16599/ 292968 | consumed samples: 33994752 | consumed tokens: 17589927936 | elapsed time per iteration (ms): 146371.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.693807E+00 | loss scale: 32768.0 | grad norm: 44007.474 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.23 | iteration 16600/ 292968 | consumed samples: 33996800 | consumed tokens: 17591926784 | elapsed time per iteration (ms): 146443.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.667416E+00 | loss scale: 32768.0 | grad norm: 20962.304 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.18 | saving checkpoint at iteration 16600 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-05 22:55:27,756] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/mp_rank_00_model_states.pt [2022-02-05 22:55:27,836] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/mp_rank_01_model_states.pt [2022-02-05 22:55:45,883] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-05 22:55:46,043] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-05 22:55:46,209] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-05 22:55:46,265] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-05 22:55:47,582] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-05 22:55:47,582] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-05 22:55:47,662] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-05 22:55:48,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-05 22:55:48,718] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-05 22:55:49,472] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-05 22:55:49,885] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-05 22:55:50,011] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-05 22:55:50,365] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-05 22:55:50,378] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-05 22:55:50,939] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-05 22:55:50,942] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-05 22:55:51,656] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-05 22:55:51,652] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-05 22:55:52,133] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-05 22:55:52,249] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-05 22:55:52,412] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-05 22:55:52,635] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-05 22:55:52,674] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-05 22:55:52,687] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-05 22:55:52,667] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-05 22:55:52,979] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-05 22:55:53,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-05 22:55:53,132] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-05 22:55:53,619] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-05 22:55:53,679] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-05 22:55:53,796] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-05 22:55:53,908] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-05 22:55:53,947] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-05 22:55:53,956] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-05 22:55:54,194] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-05 22:55:54,417] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-05 22:55:54,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-05 22:55:54,564] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-05 22:55:54,769] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-05 22:55:54,793] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-05 22:55:54,837] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-05 22:55:54,853] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-05 22:55:54,914] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-05 22:55:54,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-05 22:55:54,993] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-05 22:55:55,053] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-05 22:55:55,274] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-05 22:55:55,505] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-05 22:55:55,514] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-05 22:55:55,485] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-05 22:55:55,576] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-05 22:55:55,774] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-05 22:55:55,795] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-05 22:55:55,797] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-05 22:55:55,795] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-05 22:55:56,067] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-05 22:55:56,145] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-05 22:55:56,338] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-05 22:55:56,207] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-05 22:55:56,361] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-05 22:55:56,375] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-05 22:55:56,443] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-05 22:55:56,469] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-05 22:55:56,747] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-05 22:55:56,870] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-05 22:55:56,906] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-05 22:55:56,943] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-05 22:55:57,001] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-05 22:55:57,422] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-05 22:55:57,479] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-05 22:55:57,496] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-05 22:55:57,535] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-05 22:55:57,552] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-05 22:55:57,613] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-05 22:55:57,687] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-05 22:55:57,728] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-05 22:55:57,737] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-05 22:55:58,210] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-05 22:55:58,342] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-05 22:55:58,842] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-05 22:55:59,098] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-05 22:55:59,139] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-05 22:55:59,239] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-05 22:55:59,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-05 22:55:59,324] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-05 22:55:59,411] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-05 22:55:59,755] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-05 22:55:59,990] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-05 22:56:00,209] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-05 22:56:00,251] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-05 22:56:00,367] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-05 22:56:00,440] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-05 22:56:00,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-05 22:56:00,740] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-05 22:56:01,287] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-05 22:56:01,326] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-05 22:56:01,331] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-05 22:56:01,383] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-05 22:56:01,452] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-05 22:56:01,834] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-05 22:56:02,221] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-05 22:56:02,457] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-05 22:56:02,487] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-05 22:56:02,493] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-05 22:56:02,520] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-05 22:56:02,582] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-05 22:56:02,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-05 22:56:02,868] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-05 22:56:03,052] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-05 22:56:03,125] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-05 22:56:03,160] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-05 22:56:03,490] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-05 22:56:04,780] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-05 22:56:04,884] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-05 22:56:05,546] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-05 22:56:05,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-05 22:56:05,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-05 22:56:05,675] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-05 22:56:06,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-05 22:56:06,475] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-05 22:56:07,555] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-05 22:56:07,601] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-05 22:56:07,877] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-05 22:56:08,143] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-05 22:56:08,253] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-05 22:56:08,658] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-05 22:56:10,014] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-05 22:56:10,124] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16600/zero_pp_rank_0_mp_rank_52_optim_states.pt successfully saved checkpoint at iteration 16600 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 47394.94 iteration 16601/ 292968 | consumed samples: 33998848 | consumed tokens: 17593925632 | elapsed time per iteration (ms): 194910.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.700355E+00 | loss scale: 32768.0 | grad norm: 43332.147 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.011 | TFLOPs: 66.26 | iteration 16602/ 292968 | consumed samples: 34000896 | consumed tokens: 17595924480 | elapsed time per iteration (ms): 146246.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.663475E+00 | loss scale: 32768.0 | grad norm: 35194.389 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.30 | iteration 16603/ 292968 | consumed samples: 34002944 | consumed tokens: 17597923328 | elapsed time per iteration (ms): 146474.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.678913E+00 | loss scale: 32768.0 | grad norm: 32045.294 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.16 | iteration 16604/ 292968 | consumed samples: 34004992 | consumed tokens: 17599922176 | elapsed time per iteration (ms): 148400.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.679063E+00 | loss scale: 32768.0 | grad norm: 34058.192 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.02 | iteration 16605/ 292968 | consumed samples: 34007040 | consumed tokens: 17601921024 | elapsed time per iteration (ms): 146704.1 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.702412E+00 | loss scale: 32768.0 | grad norm: 38479.530 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.03 | iteration 16606/ 292968 | consumed samples: 34009088 | consumed tokens: 17603919872 | elapsed time per iteration (ms): 146645.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.687759E+00 | loss scale: 32768.0 | grad norm: 27805.725 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.06 | iteration 16607/ 292968 | consumed samples: 34011136 | consumed tokens: 17605918720 | elapsed time per iteration (ms): 146881.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.709002E+00 | loss scale: 32768.0 | grad norm: 26100.444 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.92 | iteration 16608/ 292968 | consumed samples: 34013184 | consumed tokens: 17607917568 | elapsed time per iteration (ms): 146945.4 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.690386E+00 | loss scale: 32768.0 | grad norm: 30174.606 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.88 | iteration 16609/ 292968 | consumed samples: 34015232 | consumed tokens: 17609916416 | elapsed time per iteration (ms): 146608.4 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.701922E+00 | loss scale: 32768.0 | grad norm: 37205.814 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.08 | iteration 16610/ 292968 | consumed samples: 34017280 | consumed tokens: 17611915264 | elapsed time per iteration (ms): 146539.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.709726E+00 | loss scale: 32768.0 | grad norm: 32004.159 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.13 | iteration 16611/ 292968 | consumed samples: 34019328 | consumed tokens: 17613914112 | elapsed time per iteration (ms): 146950.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.688086E+00 | loss scale: 32768.0 | grad norm: 37586.630 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.88 | iteration 16612/ 292968 | consumed samples: 34021376 | consumed tokens: 17615912960 | elapsed time per iteration (ms): 147109.3 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.666145E+00 | loss scale: 32768.0 | grad norm: 30689.824 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.78 | iteration 16613/ 292968 | consumed samples: 34023424 | consumed tokens: 17617911808 | elapsed time per iteration (ms): 146684.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.689226E+00 | loss scale: 32768.0 | grad norm: 21933.265 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.04 | iteration 16614/ 292968 | consumed samples: 34025472 | consumed tokens: 17619910656 | elapsed time per iteration (ms): 146532.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.696162E+00 | loss scale: 32768.0 | grad norm: 25066.532 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.13 | iteration 16615/ 292968 | consumed samples: 34027520 | consumed tokens: 17621909504 | elapsed time per iteration (ms): 147174.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.701900E+00 | loss scale: 32768.0 | grad norm: 31097.077 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.75 | iteration 16616/ 292968 | consumed samples: 34029568 | consumed tokens: 17623908352 | elapsed time per iteration (ms): 147463.3 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.672701E+00 | loss scale: 32768.0 | grad norm: 34972.890 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.57 | iteration 16617/ 292968 | consumed samples: 34031616 | consumed tokens: 17625907200 | elapsed time per iteration (ms): 146637.4 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.704607E+00 | loss scale: 32768.0 | grad norm: 31372.268 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.07 | iteration 16618/ 292968 | consumed samples: 34033664 | consumed tokens: 17627906048 | elapsed time per iteration (ms): 146538.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.714944E+00 | loss scale: 32768.0 | grad norm: 28292.001 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.13 | iteration 16619/ 292968 | consumed samples: 34035712 | consumed tokens: 17629904896 | elapsed time per iteration (ms): 146773.1 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.712318E+00 | loss scale: 32768.0 | grad norm: 36480.852 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.99 | iteration 16620/ 292968 | consumed samples: 34037760 | consumed tokens: 17631903744 | elapsed time per iteration (ms): 146822.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.713303E+00 | loss scale: 32768.0 | grad norm: 22799.735 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.96 | iteration 16621/ 292968 | consumed samples: 34039808 | consumed tokens: 17633902592 | elapsed time per iteration (ms): 146789.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.688290E+00 | loss scale: 32768.0 | grad norm: 47620.242 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.98 | iteration 16622/ 292968 | consumed samples: 34041856 | consumed tokens: 17635901440 | elapsed time per iteration (ms): 146885.3 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.681961E+00 | loss scale: 32768.0 | grad norm: 38669.105 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.92 | iteration 16623/ 292968 | consumed samples: 34043904 | consumed tokens: 17637900288 | elapsed time per iteration (ms): 146489.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.708572E+00 | loss scale: 32768.0 | grad norm: 28993.331 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.16 | iteration 16624/ 292968 | consumed samples: 34045952 | consumed tokens: 17639899136 | elapsed time per iteration (ms): 146674.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.722896E+00 | loss scale: 32768.0 | grad norm: 32045.300 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.05 | iteration 16625/ 292968 | consumed samples: 34048000 | consumed tokens: 17641897984 | elapsed time per iteration (ms): 147842.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.717967E+00 | loss scale: 32768.0 | grad norm: 43545.069 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.35 | iteration 16626/ 292968 | consumed samples: 34050048 | consumed tokens: 17643896832 | elapsed time per iteration (ms): 147634.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.681210E+00 | loss scale: 32768.0 | grad norm: 22123.205 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.47 | iteration 16627/ 292968 | consumed samples: 34052096 | consumed tokens: 17645895680 | elapsed time per iteration (ms): 146878.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.697361E+00 | loss scale: 32768.0 | grad norm: 37381.511 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.92 | iteration 16628/ 292968 | consumed samples: 34054144 | consumed tokens: 17647894528 | elapsed time per iteration (ms): 147176.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.688543E+00 | loss scale: 32768.0 | grad norm: 32715.899 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.74 | iteration 16629/ 292968 | consumed samples: 34056192 | consumed tokens: 17649893376 | elapsed time per iteration (ms): 146824.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.694381E+00 | loss scale: 32768.0 | grad norm: 35166.864 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.95 | iteration 16630/ 292968 | consumed samples: 34058240 | consumed tokens: 17651892224 | elapsed time per iteration (ms): 147502.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.688735E+00 | loss scale: 32768.0 | grad norm: 34927.058 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.55 | iteration 16631/ 292968 | consumed samples: 34060288 | consumed tokens: 17653891072 | elapsed time per iteration (ms): 146912.9 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.705074E+00 | loss scale: 32768.0 | grad norm: 29073.037 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.90 | iteration 16632/ 292968 | consumed samples: 34062336 | consumed tokens: 17655889920 | elapsed time per iteration (ms): 147005.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.700395E+00 | loss scale: 32768.0 | grad norm: 37636.938 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.85 | iteration 16633/ 292968 | consumed samples: 34064384 | consumed tokens: 17657888768 | elapsed time per iteration (ms): 147241.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.697080E+00 | loss scale: 32768.0 | grad norm: 23857.240 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.71 | iteration 16634/ 292968 | consumed samples: 34066432 | consumed tokens: 17659887616 | elapsed time per iteration (ms): 146296.4 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.695069E+00 | loss scale: 32768.0 | grad norm: 24654.752 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.27 | iteration 16635/ 292968 | consumed samples: 34068480 | consumed tokens: 17661886464 | elapsed time per iteration (ms): 146371.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.689358E+00 | loss scale: 32768.0 | grad norm: 38924.007 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.23 | iteration 16636/ 292968 | consumed samples: 34070528 | consumed tokens: 17663885312 | elapsed time per iteration (ms): 146806.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.709694E+00 | loss scale: 32768.0 | grad norm: 32383.255 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.97 | iteration 16637/ 292968 | consumed samples: 34072576 | consumed tokens: 17665884160 | elapsed time per iteration (ms): 146799.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.702575E+00 | loss scale: 32768.0 | grad norm: 40444.780 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.97 | iteration 16638/ 292968 | consumed samples: 34074624 | consumed tokens: 17667883008 | elapsed time per iteration (ms): 146833.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.675325E+00 | loss scale: 32768.0 | grad norm: 31060.678 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.95 | iteration 16639/ 292968 | consumed samples: 34076672 | consumed tokens: 17669881856 | elapsed time per iteration (ms): 148533.1 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.702532E+00 | loss scale: 32768.0 | grad norm: 32907.328 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.94 | iteration 16640/ 292968 | consumed samples: 34078720 | consumed tokens: 17671880704 | elapsed time per iteration (ms): 146663.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.683746E+00 | loss scale: 32768.0 | grad norm: 38109.198 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.05 | iteration 16641/ 292968 | consumed samples: 34080768 | consumed tokens: 17673879552 | elapsed time per iteration (ms): 146635.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.679994E+00 | loss scale: 32768.0 | grad norm: 23136.249 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.07 | iteration 16642/ 292968 | consumed samples: 34082816 | consumed tokens: 17675878400 | elapsed time per iteration (ms): 146636.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.685842E+00 | loss scale: 32768.0 | grad norm: 23502.630 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.07 | iteration 16643/ 292968 | consumed samples: 34084864 | consumed tokens: 17677877248 | elapsed time per iteration (ms): 146969.5 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.705220E+00 | loss scale: 32768.0 | grad norm: 36106.309 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.87 | iteration 16644/ 292968 | consumed samples: 34086912 | consumed tokens: 17679876096 | elapsed time per iteration (ms): 147459.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.673936E+00 | loss scale: 32768.0 | grad norm: 30538.751 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.58 | iteration 16645/ 292968 | consumed samples: 34088960 | consumed tokens: 17681874944 | elapsed time per iteration (ms): 146457.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.695421E+00 | loss scale: 65536.0 | grad norm: 31484.771 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.18 | iteration 16646/ 292968 | consumed samples: 34091008 | consumed tokens: 17683873792 | elapsed time per iteration (ms): 147263.1 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.674103E+00 | loss scale: 65536.0 | grad norm: 56148.804 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.69 | iteration 16647/ 292968 | consumed samples: 34093056 | consumed tokens: 17685872640 | elapsed time per iteration (ms): 147497.2 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.675219E+00 | loss scale: 65536.0 | grad norm: 109637.745 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.55 | iteration 16648/ 292968 | consumed samples: 34095104 | consumed tokens: 17687871488 | elapsed time per iteration (ms): 147060.6 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.683588E+00 | loss scale: 65536.0 | grad norm: 48522.354 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.81 | iteration 16649/ 292968 | consumed samples: 34097152 | consumed tokens: 17689870336 | elapsed time per iteration (ms): 148142.4 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.721914E+00 | loss scale: 65536.0 | grad norm: 108154.693 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.17 | iteration 16650/ 292968 | consumed samples: 34099200 | consumed tokens: 17691869184 | elapsed time per iteration (ms): 147447.8 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.669487E+00 | loss scale: 65536.0 | grad norm: 60907.905 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.58 | ------------------------------------------------------------------------------------------- valid loss at iteration 16650 | lm loss value: 3.079885E+00 | lm loss PPL: 2.175589E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 16650 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 01:06:37,592] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/mp_rank_00_model_states.pt [2022-02-06 01:06:37,791] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/mp_rank_01_model_states.pt [2022-02-06 01:07:49,832] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 01:07:58,815] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 01:07:59,332] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 01:08:00,575] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 01:08:00,874] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 01:08:01,294] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 01:08:01,790] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 01:08:01,808] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 01:08:02,178] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 01:08:02,345] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 01:08:02,414] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 01:08:02,591] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 01:08:02,732] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 01:08:02,717] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 01:08:02,838] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 01:08:02,855] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 01:08:02,967] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 01:08:02,984] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 01:08:03,153] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 01:08:03,254] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 01:08:03,302] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 01:08:03,389] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 01:08:03,385] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 01:08:03,524] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 01:08:03,624] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 01:08:03,645] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 01:08:04,188] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 01:08:04,322] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 01:08:04,385] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 01:08:04,415] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 01:08:04,550] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 01:08:04,575] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 01:08:04,583] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 01:08:04,576] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 01:08:04,691] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 01:08:04,973] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 01:08:05,039] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 01:08:05,050] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 01:08:05,079] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 01:08:05,251] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 01:08:05,350] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 01:08:05,522] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 01:08:05,593] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 01:08:05,766] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 01:08:05,823] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 01:08:05,864] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 01:08:05,887] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 01:08:05,895] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 01:08:05,938] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 01:08:05,977] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 01:08:06,014] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 01:08:06,060] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 01:08:06,096] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 01:08:06,116] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 01:08:06,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 01:08:06,136] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 01:08:06,184] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 01:08:06,246] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 01:08:06,452] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 01:08:06,501] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 01:08:06,523] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 01:08:06,557] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 01:08:06,566] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 01:08:06,751] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 01:08:06,862] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 01:08:07,021] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 01:08:07,049] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 01:08:07,062] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 01:08:07,079] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 01:08:07,201] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 01:08:07,261] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 01:08:07,402] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 01:08:07,502] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 01:08:07,604] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 01:08:07,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 01:08:07,682] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 01:08:07,688] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 01:08:07,912] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 01:08:08,006] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 01:08:08,017] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 01:08:08,090] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 01:08:08,125] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 01:08:08,136] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 01:08:08,180] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 01:08:08,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 01:08:08,433] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 01:08:08,496] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 01:08:08,640] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 01:08:08,653] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 01:08:08,659] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 01:08:08,825] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 01:08:08,837] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 01:08:08,945] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 01:08:09,364] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 01:08:09,827] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 01:08:09,841] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 01:08:10,162] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 01:08:11,525] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 01:08:11,540] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 01:08:11,730] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 01:08:11,849] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 01:08:12,301] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 01:08:12,558] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 01:08:12,800] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 01:08:13,296] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 01:08:13,342] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 01:08:13,574] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 01:08:14,647] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 01:08:14,989] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 01:08:15,928] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 01:08:15,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 01:08:16,117] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 01:08:16,157] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 01:08:16,451] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 01:08:17,577] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 01:08:17,811] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 01:08:22,065] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 01:08:25,058] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 01:08:25,059] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 01:08:25,124] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 01:08:26,335] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 01:08:26,519] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 01:08:36,366] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 01:08:48,841] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 01:08:57,778] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 01:09:07,446] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 01:09:18,543] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 01:09:26,975] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16650/zero_pp_rank_0_mp_rank_120_optim_states.pt successfully saved checkpoint at iteration 16650 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 192625.32 iteration 16651/ 292968 | consumed samples: 34101248 | consumed tokens: 17693868032 | elapsed time per iteration (ms): 795197.9 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.719188E+00 | loss scale: 65536.0 | grad norm: 83191.783 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 16.24 | iteration 16652/ 292968 | consumed samples: 34103296 | consumed tokens: 17695866880 | elapsed time per iteration (ms): 147626.7 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.697633E+00 | loss scale: 65536.0 | grad norm: 71675.318 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.48 | iteration 16653/ 292968 | consumed samples: 34105344 | consumed tokens: 17697865728 | elapsed time per iteration (ms): 146915.0 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.701860E+00 | loss scale: 65536.0 | grad norm: 59149.950 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.90 | iteration 16654/ 292968 | consumed samples: 34107392 | consumed tokens: 17699864576 | elapsed time per iteration (ms): 146563.1 | learning rate: 5.939E-05 | global batch size: 2048 | lm loss: 2.686182E+00 | loss scale: 65536.0 | grad norm: 90620.401 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.11 | iteration 16655/ 292968 | consumed samples: 34109440 | consumed tokens: 17701863424 | elapsed time per iteration (ms): 146955.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.676085E+00 | loss scale: 65536.0 | grad norm: 52601.134 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.88 | iteration 16656/ 292968 | consumed samples: 34111488 | consumed tokens: 17703862272 | elapsed time per iteration (ms): 149249.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.684921E+00 | loss scale: 65536.0 | grad norm: 63359.605 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.53 | iteration 16657/ 292968 | consumed samples: 34113536 | consumed tokens: 17705861120 | elapsed time per iteration (ms): 148033.2 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.667128E+00 | loss scale: 65536.0 | grad norm: 58659.633 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.24 | iteration 16658/ 292968 | consumed samples: 34115584 | consumed tokens: 17707859968 | elapsed time per iteration (ms): 147016.2 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.697762E+00 | loss scale: 65536.0 | grad norm: 71477.197 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.84 | iteration 16659/ 292968 | consumed samples: 34117632 | consumed tokens: 17709858816 | elapsed time per iteration (ms): 147066.9 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.702721E+00 | loss scale: 65536.0 | grad norm: 91689.670 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.81 | iteration 16660/ 292968 | consumed samples: 34119680 | consumed tokens: 17711857664 | elapsed time per iteration (ms): 147340.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.686311E+00 | loss scale: 65536.0 | grad norm: 46502.516 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.65 | iteration 16661/ 292968 | consumed samples: 34121728 | consumed tokens: 17713856512 | elapsed time per iteration (ms): 146960.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.669430E+00 | loss scale: 65536.0 | grad norm: 66379.024 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.87 | iteration 16662/ 292968 | consumed samples: 34123776 | consumed tokens: 17715855360 | elapsed time per iteration (ms): 147682.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.697407E+00 | loss scale: 65536.0 | grad norm: 95316.326 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.44 | iteration 16663/ 292968 | consumed samples: 34125824 | consumed tokens: 17717854208 | elapsed time per iteration (ms): 149041.9 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.663357E+00 | loss scale: 65536.0 | grad norm: 38197.534 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.65 | iteration 16664/ 292968 | consumed samples: 34127872 | consumed tokens: 17719853056 | elapsed time per iteration (ms): 150371.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.689462E+00 | loss scale: 65536.0 | grad norm: 117747.391 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.88 | iteration 16665/ 292968 | consumed samples: 34129920 | consumed tokens: 17721851904 | elapsed time per iteration (ms): 147241.3 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.685837E+00 | loss scale: 65536.0 | grad norm: 48699.205 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.71 | iteration 16666/ 292968 | consumed samples: 34131968 | consumed tokens: 17723850752 | elapsed time per iteration (ms): 147290.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.675823E+00 | loss scale: 65536.0 | grad norm: 84781.913 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.68 | iteration 16667/ 292968 | consumed samples: 34134016 | consumed tokens: 17725849600 | elapsed time per iteration (ms): 146829.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.679126E+00 | loss scale: 65536.0 | grad norm: 63128.500 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.95 | iteration 16668/ 292968 | consumed samples: 34136064 | consumed tokens: 17727848448 | elapsed time per iteration (ms): 146749.8 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.676843E+00 | loss scale: 65536.0 | grad norm: 64752.413 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.00 | iteration 16669/ 292968 | consumed samples: 34138112 | consumed tokens: 17729847296 | elapsed time per iteration (ms): 147696.8 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.662843E+00 | loss scale: 65536.0 | grad norm: 70216.042 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.44 | iteration 16670/ 292968 | consumed samples: 34140160 | consumed tokens: 17731846144 | elapsed time per iteration (ms): 149539.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.695504E+00 | loss scale: 65536.0 | grad norm: 82028.739 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.36 | iteration 16671/ 292968 | consumed samples: 34142208 | consumed tokens: 17733844992 | elapsed time per iteration (ms): 146780.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.672150E+00 | loss scale: 65536.0 | grad norm: 57688.408 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.98 | iteration 16672/ 292968 | consumed samples: 34144256 | consumed tokens: 17735843840 | elapsed time per iteration (ms): 146606.3 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.656594E+00 | loss scale: 65536.0 | grad norm: 54149.773 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.09 | iteration 16673/ 292968 | consumed samples: 34146304 | consumed tokens: 17737842688 | elapsed time per iteration (ms): 146769.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.679446E+00 | loss scale: 65536.0 | grad norm: 58811.291 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.99 | iteration 16674/ 292968 | consumed samples: 34148352 | consumed tokens: 17739841536 | elapsed time per iteration (ms): 147563.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.673062E+00 | loss scale: 65536.0 | grad norm: 77211.118 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.51 | iteration 16675/ 292968 | consumed samples: 34150400 | consumed tokens: 17741840384 | elapsed time per iteration (ms): 148242.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.676666E+00 | loss scale: 65536.0 | grad norm: 68534.585 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16676/ 292968 | consumed samples: 34152448 | consumed tokens: 17743839232 | elapsed time per iteration (ms): 146843.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.679420E+00 | loss scale: 65536.0 | grad norm: 60634.959 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.94 | iteration 16677/ 292968 | consumed samples: 34154496 | consumed tokens: 17745838080 | elapsed time per iteration (ms): 146913.9 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.698879E+00 | loss scale: 65536.0 | grad norm: 80628.667 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.90 | iteration 16678/ 292968 | consumed samples: 34156544 | consumed tokens: 17747836928 | elapsed time per iteration (ms): 148587.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.675501E+00 | loss scale: 65536.0 | grad norm: 71431.056 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.91 | iteration 16679/ 292968 | consumed samples: 34158592 | consumed tokens: 17749835776 | elapsed time per iteration (ms): 148910.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.678594E+00 | loss scale: 65536.0 | grad norm: 89447.189 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.72 | iteration 16680/ 292968 | consumed samples: 34160640 | consumed tokens: 17751834624 | elapsed time per iteration (ms): 147392.2 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.665673E+00 | loss scale: 65536.0 | grad norm: 44638.308 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.62 | iteration 16681/ 292968 | consumed samples: 34162688 | consumed tokens: 17753833472 | elapsed time per iteration (ms): 147101.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.698293E+00 | loss scale: 65536.0 | grad norm: 72422.423 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.79 | iteration 16682/ 292968 | consumed samples: 34164736 | consumed tokens: 17755832320 | elapsed time per iteration (ms): 148147.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.670125E+00 | loss scale: 65536.0 | grad norm: 101699.350 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.17 | iteration 16683/ 292968 | consumed samples: 34166784 | consumed tokens: 17757831168 | elapsed time per iteration (ms): 147194.6 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.692187E+00 | loss scale: 65536.0 | grad norm: 43133.109 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.73 | iteration 16684/ 292968 | consumed samples: 34168832 | consumed tokens: 17759830016 | elapsed time per iteration (ms): 148877.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.683364E+00 | loss scale: 65536.0 | grad norm: 167748.036 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.74 | iteration 16685/ 292968 | consumed samples: 34170880 | consumed tokens: 17761828864 | elapsed time per iteration (ms): 147073.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.696322E+00 | loss scale: 65536.0 | grad norm: 88843.897 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.81 | iteration 16686/ 292968 | consumed samples: 34172928 | consumed tokens: 17763827712 | elapsed time per iteration (ms): 147199.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.702125E+00 | loss scale: 65536.0 | grad norm: 148721.764 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.73 | iteration 16687/ 292968 | consumed samples: 34174976 | consumed tokens: 17765826560 | elapsed time per iteration (ms): 148605.6 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.698012E+00 | loss scale: 65536.0 | grad norm: 146194.590 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.90 | iteration 16688/ 292968 | consumed samples: 34177024 | consumed tokens: 17767825408 | elapsed time per iteration (ms): 147081.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.673876E+00 | loss scale: 65536.0 | grad norm: 67713.674 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.80 | iteration 16689/ 292968 | consumed samples: 34179072 | consumed tokens: 17769824256 | elapsed time per iteration (ms): 147920.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.710619E+00 | loss scale: 65536.0 | grad norm: 94330.714 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.30 | iteration 16690/ 292968 | consumed samples: 34181120 | consumed tokens: 17771823104 | elapsed time per iteration (ms): 147408.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.690161E+00 | loss scale: 65536.0 | grad norm: 62106.272 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.61 | iteration 16691/ 292968 | consumed samples: 34183168 | consumed tokens: 17773821952 | elapsed time per iteration (ms): 147261.3 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.696228E+00 | loss scale: 65536.0 | grad norm: 85796.384 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.69 | iteration 16692/ 292968 | consumed samples: 34185216 | consumed tokens: 17775820800 | elapsed time per iteration (ms): 147084.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.699748E+00 | loss scale: 65536.0 | grad norm: 50948.694 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.80 | iteration 16693/ 292968 | consumed samples: 34187264 | consumed tokens: 17777819648 | elapsed time per iteration (ms): 147048.2 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.683928E+00 | loss scale: 65536.0 | grad norm: 62218.597 | num zeros: 0.0 | curriculum seqlen: 976 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.82 | iteration 16694/ 292968 | consumed samples: 34189312 | consumed tokens: 17779834880 | elapsed time per iteration (ms): 150044.8 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.657199E+00 | loss scale: 65536.0 | grad norm: 76979.342 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.77 | iteration 16695/ 292968 | consumed samples: 34191360 | consumed tokens: 17781850112 | elapsed time per iteration (ms): 148526.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.655814E+00 | loss scale: 65536.0 | grad norm: 54881.584 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.66 | iteration 16696/ 292968 | consumed samples: 34193408 | consumed tokens: 17783865344 | elapsed time per iteration (ms): 149685.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.651707E+00 | loss scale: 65536.0 | grad norm: 90822.950 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.98 | iteration 16697/ 292968 | consumed samples: 34195456 | consumed tokens: 17785880576 | elapsed time per iteration (ms): 149035.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.690256E+00 | loss scale: 65536.0 | grad norm: 40437.959 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.36 | iteration 16698/ 292968 | consumed samples: 34197504 | consumed tokens: 17787895808 | elapsed time per iteration (ms): 149473.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.692542E+00 | loss scale: 65536.0 | grad norm: 59736.399 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.10 | iteration 16699/ 292968 | consumed samples: 34199552 | consumed tokens: 17789911040 | elapsed time per iteration (ms): 149014.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.694221E+00 | loss scale: 65536.0 | grad norm: 102549.292 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.37 | iteration 16700/ 292968 | consumed samples: 34201600 | consumed tokens: 17791926272 | elapsed time per iteration (ms): 149915.9 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.682245E+00 | loss scale: 65536.0 | grad norm: 43248.725 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.85 | saving checkpoint at iteration 16700 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 03:12:54,252] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/mp_rank_01_model_states.pt [2022-02-06 03:12:54,299] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/mp_rank_00_model_states.pt [2022-02-06 03:13:27,296] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 03:13:28,357] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 03:13:29,538] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 03:13:29,653] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 03:13:29,666] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 03:13:30,048] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 03:13:30,088] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 03:13:30,185] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 03:13:30,295] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 03:13:30,316] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 03:13:30,363] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 03:13:30,469] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 03:13:30,912] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 03:13:30,944] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 03:13:31,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 03:13:31,233] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 03:13:31,243] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 03:13:31,252] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 03:13:31,450] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 03:13:31,747] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 03:13:31,942] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 03:13:32,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 03:13:32,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 03:13:32,200] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 03:13:32,289] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 03:13:32,294] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 03:13:32,384] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 03:13:32,789] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 03:13:32,817] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 03:13:32,834] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 03:13:33,083] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 03:13:33,263] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 03:13:33,323] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 03:13:33,405] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 03:13:33,525] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 03:13:33,538] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 03:13:33,614] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 03:13:33,614] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 03:13:33,663] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 03:13:33,706] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 03:13:33,756] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 03:13:33,765] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 03:13:33,825] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 03:13:34,033] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 03:13:34,174] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 03:13:34,272] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 03:13:34,358] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 03:13:34,496] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 03:13:34,481] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 03:13:34,602] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 03:13:34,775] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 03:13:34,796] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 03:13:34,862] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 03:13:34,822] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 03:13:35,004] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 03:13:35,103] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 03:13:35,253] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 03:13:35,237] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 03:13:35,265] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 03:13:35,339] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 03:13:35,351] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 03:13:35,429] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 03:13:35,667] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 03:13:35,895] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 03:13:35,908] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 03:13:35,930] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 03:13:35,899] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 03:13:35,949] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 03:13:36,030] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 03:13:36,060] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 03:13:36,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 03:13:36,306] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 03:13:36,382] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 03:13:36,589] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 03:13:36,608] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 03:13:36,740] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 03:13:36,898] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 03:13:36,927] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 03:13:37,402] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 03:13:37,299] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 03:13:37,610] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 03:13:37,760] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 03:13:37,773] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 03:13:37,964] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 03:13:37,832] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 03:13:38,021] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 03:13:38,064] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 03:13:38,066] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 03:13:38,225] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 03:13:38,152] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 03:13:38,260] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 03:13:38,867] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 03:13:38,919] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 03:13:39,361] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 03:13:39,464] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 03:13:39,727] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 03:13:39,907] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 03:13:40,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 03:13:40,972] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 03:13:41,035] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 03:13:41,908] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 03:13:42,016] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 03:13:42,016] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 03:13:42,006] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 03:13:42,067] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 03:13:42,220] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 03:13:42,234] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 03:13:42,318] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 03:13:42,410] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 03:13:42,418] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 03:13:42,798] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 03:13:43,021] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 03:13:43,111] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 03:13:43,245] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 03:13:43,302] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 03:13:43,369] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 03:13:43,628] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 03:13:43,698] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 03:13:44,036] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 03:13:44,213] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 03:13:45,248] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 03:13:45,744] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 03:13:45,959] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 03:13:46,818] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 03:13:47,284] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 03:13:47,621] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 03:13:48,595] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 03:13:48,803] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16700/zero_pp_rank_0_mp_rank_03_optim_states.pt successfully saved checkpoint at iteration 16700 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 69625.83 iteration 16701/ 292968 | consumed samples: 34203648 | consumed tokens: 17793941504 | elapsed time per iteration (ms): 219060.6 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.640103E+00 | loss scale: 65536.0 | grad norm: 106010.867 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.009 | TFLOPs: 59.43 | iteration 16702/ 292968 | consumed samples: 34205696 | consumed tokens: 17795956736 | elapsed time per iteration (ms): 148445.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.673440E+00 | loss scale: 65536.0 | grad norm: 52361.159 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.71 | iteration 16703/ 292968 | consumed samples: 34207744 | consumed tokens: 17797971968 | elapsed time per iteration (ms): 148455.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.689924E+00 | loss scale: 65536.0 | grad norm: 65946.253 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.70 | iteration 16704/ 292968 | consumed samples: 34209792 | consumed tokens: 17799987200 | elapsed time per iteration (ms): 148458.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.699447E+00 | loss scale: 65536.0 | grad norm: 69911.480 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.70 | iteration 16705/ 292968 | consumed samples: 34211840 | consumed tokens: 17802002432 | elapsed time per iteration (ms): 149132.9 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.659240E+00 | loss scale: 65536.0 | grad norm: 66669.097 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.30 | iteration 16706/ 292968 | consumed samples: 34213888 | consumed tokens: 17804017664 | elapsed time per iteration (ms): 150257.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.687188E+00 | loss scale: 65536.0 | grad norm: 87812.991 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.65 | iteration 16707/ 292968 | consumed samples: 34215936 | consumed tokens: 17806032896 | elapsed time per iteration (ms): 148915.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.706013E+00 | loss scale: 65536.0 | grad norm: 46195.865 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.43 | iteration 16708/ 292968 | consumed samples: 34217984 | consumed tokens: 17808048128 | elapsed time per iteration (ms): 148929.3 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.688119E+00 | loss scale: 65536.0 | grad norm: 70662.002 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.42 | iteration 16709/ 292968 | consumed samples: 34220032 | consumed tokens: 17810063360 | elapsed time per iteration (ms): 155475.2 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.684760E+00 | loss scale: 65536.0 | grad norm: 90776.699 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.74 | iteration 16710/ 292968 | consumed samples: 34222080 | consumed tokens: 17812078592 | elapsed time per iteration (ms): 148692.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.680365E+00 | loss scale: 65536.0 | grad norm: 38610.680 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.56 | iteration 16711/ 292968 | consumed samples: 34224128 | consumed tokens: 17814093824 | elapsed time per iteration (ms): 148542.5 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.679073E+00 | loss scale: 65536.0 | grad norm: 73406.015 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.65 | iteration 16712/ 292968 | consumed samples: 34226176 | consumed tokens: 17816109056 | elapsed time per iteration (ms): 149515.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.681103E+00 | loss scale: 65536.0 | grad norm: 73550.273 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.08 | iteration 16713/ 292968 | consumed samples: 34228224 | consumed tokens: 17818124288 | elapsed time per iteration (ms): 149528.9 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.688037E+00 | loss scale: 65536.0 | grad norm: 54117.706 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.07 | iteration 16714/ 292968 | consumed samples: 34230272 | consumed tokens: 17820139520 | elapsed time per iteration (ms): 148477.8 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.668221E+00 | loss scale: 65536.0 | grad norm: 80516.543 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.69 | saving checkpoint at iteration 16714 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 03:48:55,123] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/mp_rank_00_model_states.pt [2022-02-06 03:48:55,177] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/mp_rank_01_model_states.pt [2022-02-06 03:49:26,966] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 03:49:27,776] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 03:49:27,965] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 03:49:27,980] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 03:49:28,178] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 03:49:28,229] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 03:49:28,281] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 03:49:28,633] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 03:49:28,973] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 03:49:28,990] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 03:49:29,088] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 03:49:29,280] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 03:49:29,939] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 03:49:30,510] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 03:49:30,576] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 03:49:30,588] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 03:49:30,688] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 03:49:30,855] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 03:49:31,018] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 03:49:31,006] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 03:49:31,011] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 03:49:31,043] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 03:49:31,084] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 03:49:31,253] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 03:49:31,346] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 03:49:31,357] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 03:49:31,558] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 03:49:31,736] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 03:49:31,738] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 03:49:31,936] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 03:49:32,059] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 03:49:32,108] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 03:49:32,295] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 03:49:32,375] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 03:49:32,441] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 03:49:32,680] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 03:49:32,653] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 03:49:32,787] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 03:49:33,112] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 03:49:33,251] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 03:49:33,243] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 03:49:33,336] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 03:49:33,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 03:49:33,501] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 03:49:34,352] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 03:49:34,359] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 03:49:34,281] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 03:49:34,389] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 03:49:34,524] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 03:49:34,530] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 03:49:34,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 03:49:34,515] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 03:49:34,700] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 03:49:34,724] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 03:49:35,030] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 03:49:35,085] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 03:49:35,430] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 03:49:35,450] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 03:49:35,512] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 03:49:35,568] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 03:49:36,503] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 03:49:36,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 03:49:36,973] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 03:49:37,947] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 03:49:38,183] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 03:49:38,244] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 03:49:38,594] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 03:49:38,601] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 03:49:38,819] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 03:49:38,917] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 03:49:38,809] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 03:49:39,018] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 03:49:39,189] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 03:49:39,272] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 03:49:39,350] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 03:49:39,410] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 03:49:39,518] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 03:49:39,569] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 03:49:39,785] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 03:49:39,712] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 03:49:40,020] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 03:49:40,458] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 03:49:41,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 03:49:41,181] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 03:49:41,214] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 03:49:41,444] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 03:49:41,489] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 03:49:41,833] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 03:49:41,937] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 03:49:41,825] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 03:49:42,079] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 03:49:42,212] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 03:49:42,346] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 03:49:42,412] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 03:49:42,749] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 03:49:42,750] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 03:49:42,760] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 03:49:42,814] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 03:49:42,856] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 03:49:42,859] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 03:49:42,969] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 03:49:42,973] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 03:49:43,255] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 03:49:43,423] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 03:49:43,467] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 03:49:43,493] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 03:49:43,344] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 03:49:43,565] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 03:49:43,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 03:49:43,777] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 03:49:44,045] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 03:49:44,076] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 03:49:44,123] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 03:49:44,254] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 03:49:44,279] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 03:49:44,631] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 03:49:44,851] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 03:49:44,955] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 03:49:45,261] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 03:49:45,340] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 03:49:46,271] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 03:49:46,358] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 03:49:47,016] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 03:49:47,115] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 03:49:47,577] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 03:49:47,694] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 03:49:51,281] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 03:49:51,408] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16714/zero_pp_rank_0_mp_rank_03_optim_states.pt successfully saved checkpoint at iteration 16714 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 70345.81 [exiting program after 1186.2843316952387 minutes] datetime: 2022-02-06 03:49:51 /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1770260.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] > setting tensorboard ... sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.10.2 torch cuda version ............... 11.3 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+ba9c4cc7, ba9c4cc7, master deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-06 03:50:30,717] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.126 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 7.118 seconds time to initialize megatron (seconds): -2.037 [after megatron is initialized] datetime: 2022-02-06 03:50:37 building GPT model ... [2022-02-06 03:50:38,021] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-06 03:50:38,022] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-06 03:50:38,022] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.45 GB, percent = 9.8% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-06 03:50:39,735] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-06 03:50:40,344] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-06 03:50:40,344] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-06 03:50:40,344] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 50.08 GB, percent = 10.0% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-06 03:50:40,437] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+ba9c4cc7, git-hash=ba9c4cc7, git-branch=master [2022-02-06 03:50:41,091] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-06 03:50:41,091] [INFO] [engine.py:1099:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-06 03:50:41,091] [INFO] [engine.py:1105:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-06 03:50:41,092] [INFO] [engine.py:1121:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-06 03:50:41,092] [INFO] [utils.py:48:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-06 03:50:41,092] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-06 03:50:41,092] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-06 03:50:41,092] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-06 03:50:41,092] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-06 03:50:41,092] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] [2022-02-06 03:50:46,469] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-06 03:50:46,470] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-06 03:50:46,470] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.92 GB, percent = 9.9% [2022-02-06 03:50:46,535] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-06 03:50:46,535] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-06 03:50:46,535] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.92 GB, percent = 9.9% [2022-02-06 03:50:46,535] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-06 03:50:46,563] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-06 03:50:46,563] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-06 03:50:46,563] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.92 GB, percent = 9.9% [2022-02-06 03:50:46,564] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-06 03:50:46,564] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-06 03:50:46,564] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-06 03:50:46,564] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-06 03:50:46,564] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-06 03:50:46,564] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-06 03:50:46,565] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_16bit_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-06 03:50:46,566] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-06 03:50:46,566] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-06 03:50:46,566] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,912] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 03:50:48,913] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-06 03:51:11,457] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-06 03:51:12,168] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-06 03:51:12,880] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-06 03:51:12,930] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-06 03:51:12,970] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-06 03:51:13,546] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-06 03:51:13,916] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-06 03:51:14,441] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-06 03:51:14,550] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-06 03:51:14,681] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-06 03:51:14,963] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-06 03:51:15,316] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-02-06 03:51:16,087] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-06 03:51:16,256] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-06 03:51:16,362] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-06 03:51:16,383] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-06 03:51:16,634] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-06 03:51:16,814] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-06 03:51:17,286] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-06 03:51:17,361] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-06 03:51:17,599] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-06 03:51:17,646] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-06 03:51:17,742] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-06 03:51:17,851] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-06 03:51:18,102] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-06 03:51:18,108] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-06 03:51:18,114] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-06 03:51:18,354] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-06 03:51:18,409] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-06 03:51:18,473] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-06 03:51:18,707] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-06 03:51:18,781] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-06 03:51:18,916] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-06 03:51:19,025] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-06 03:51:19,105] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-06 03:51:19,119] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-06 03:51:19,126] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-06 03:51:19,127] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-06 03:51:19,259] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-06 03:51:19,338] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-06 03:51:19,352] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-06 03:51:19,389] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-06 03:51:19,398] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-06 03:51:19,532] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-06 03:51:19,605] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-06 03:51:19,681] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-06 03:51:19,760] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-06 03:51:19,761] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-06 03:51:19,795] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-06 03:51:19,950] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-06 03:51:19,952] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-06 03:51:20,073] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-06 03:51:20,137] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-06 03:51:20,413] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-06 03:51:20,422] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-06 03:51:20,467] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-06 03:51:20,562] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-06 03:51:20,584] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-06 03:51:20,591] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-06 03:51:20,605] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-06 03:51:20,625] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-06 03:51:20,642] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-06 03:51:20,664] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-06 03:51:20,736] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-06 03:51:20,793] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-06 03:51:20,794] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-06 03:51:20,901] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-06 03:51:21,009] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-06 03:51:21,021] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-06 03:51:21,065] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-06 03:51:21,079] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-06 03:51:21,082] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-06 03:51:21,095] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-06 03:51:21,179] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-06 03:51:21,222] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-06 03:51:21,250] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-06 03:51:21,264] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-06 03:51:21,307] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-06 03:51:21,355] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-06 03:51:21,356] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-06 03:51:21,393] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-06 03:51:21,444] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-06 03:51:21,542] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-06 03:51:21,618] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-06 03:51:21,633] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-06 03:51:21,699] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-06 03:51:21,712] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-06 03:51:21,720] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-06 03:51:21,724] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-06 03:51:21,778] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-06 03:51:21,792] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-06 03:51:21,802] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-06 03:51:21,837] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-06 03:51:21,990] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-06 03:51:22,015] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-06 03:51:22,038] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-06 03:51:22,046] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-06 03:51:22,048] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-06 03:51:22,094] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-06 03:51:22,105] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-06 03:51:22,106] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-06 03:51:22,240] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-06 03:51:22,260] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-06 03:51:22,357] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-06 03:51:22,360] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-06 03:51:22,376] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-06 03:51:22,380] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-06 03:51:22,415] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-06 03:51:22,425] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-06 03:51:22,464] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-06 03:51:22,487] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-06 03:51:22,514] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-06 03:51:22,532] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-06 03:51:22,557] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-06 03:51:22,591] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-06 03:51:22,689] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-06 03:51:22,792] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-06 03:51:22,802] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-06 03:51:22,804] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-06 03:51:22,848] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-06 03:51:22,848] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-06 03:51:22,862] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-06 03:51:22,880] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-06 03:51:22,886] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-06 03:51:22,889] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-06 03:51:22,894] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-06 03:51:22,964] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-06 03:51:22,977] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-06 03:51:23,009] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-06 03:51:23,014] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-06 03:51:23,038] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-06 03:51:23,047] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-06 03:51:23,076] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-06 03:51:23,110] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-06 03:51:23,111] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-06 03:51:23,115] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-06 03:51:23,137] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-06 03:51:23,137] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-06 03:51:23,170] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-06 03:51:23,185] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-06 03:51:23,190] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-06 03:51:23,196] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-06 03:51:23,251] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-06 03:51:23,270] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-06 03:51:23,280] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-06 03:51:23,320] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-06 03:51:23,387] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-06 03:51:23,389] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-06 03:51:23,445] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-06 03:51:23,510] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-06 03:51:23,512] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-06 03:51:23,513] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-02-06 03:51:23,540] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-06 03:51:23,576] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-06 03:51:23,577] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-06 03:51:23,585] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-06 03:51:23,593] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-06 03:51:23,596] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-06 03:51:23,602] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-06 03:51:23,605] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-06 03:51:23,656] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-06 03:51:23,690] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-06 03:51:23,695] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-06 03:51:23,697] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-06 03:51:23,697] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-06 03:51:23,711] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-06 03:51:23,714] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-06 03:51:23,715] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-06 03:51:23,735] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-06 03:51:23,769] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-06 03:51:23,775] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-06 03:51:23,784] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-06 03:51:23,807] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-06 03:51:23,894] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-06 03:51:24,018] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-06 03:51:24,022] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-06 03:51:24,029] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-06 03:51:24,075] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-06 03:51:24,138] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-06 03:51:24,150] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-06 03:51:24,152] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-06 03:51:24,205] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-06 03:51:24,209] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-06 03:51:24,216] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-06 03:51:24,231] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-06 03:51:24,254] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-06 03:51:24,274] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-06 03:51:24,304] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-06 03:51:24,343] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-06 03:51:24,355] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-06 03:51:24,405] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-06 03:51:24,474] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-06 03:51:24,476] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-06 03:51:24,488] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-06 03:51:24,519] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-06 03:51:24,602] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-06 03:51:24,627] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-06 03:51:24,636] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-06 03:51:24,638] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-06 03:51:24,655] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-06 03:51:24,661] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-06 03:51:24,722] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-06 03:51:24,748] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-06 03:51:24,757] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-06 03:51:24,765] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-06 03:51:24,783] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-06 03:51:24,812] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-06 03:51:24,858] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-06 03:51:24,921] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-02-06 03:51:24,939] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-06 03:51:25,012] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-06 03:51:25,032] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-06 03:51:25,067] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-06 03:51:25,098] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-06 03:51:25,100] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-06 03:51:25,127] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-06 03:51:25,139] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-06 03:51:25,141] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-06 03:51:25,160] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-02-06 03:51:25,191] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-06 03:51:25,232] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-06 03:51:25,233] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-06 03:51:25,235] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-06 03:51:25,248] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-06 03:51:25,273] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-06 03:51:25,314] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-06 03:51:25,347] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-06 03:51:25,360] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-06 03:51:25,372] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-06 03:51:25,446] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-06 03:51:25,492] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-06 03:51:25,501] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-06 03:51:25,506] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-06 03:51:25,531] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-06 03:51:25,588] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-06 03:51:25,610] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-06 03:51:25,632] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-06 03:51:25,649] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-06 03:51:25,657] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-06 03:51:25,665] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-06 03:51:25,741] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-06 03:51:25,815] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-06 03:51:25,848] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-06 03:51:25,892] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-06 03:51:26,012] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-06 03:51:26,024] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-06 03:51:26,205] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-06 03:51:26,292] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-06 03:51:26,608] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-06 03:51:26,661] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-06 03:51:26,845] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-02-06 03:51:26,892] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 [2022-02-06 03:51:26,892] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-02-06 03:51:26,935] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-06 03:51:26,946] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-06 03:51:27,247] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 16714 time (ms) | load-checkpoint: 37062.10 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-06 03:51:27 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.075731 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.198 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.152 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.076 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-06 03:51:34 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 49303.47 | train/valid/test-data-iterators-setup: 7016.30 [001-001] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B [001-000] 125.2243B / 103.3681B [002-007] 103.3651B / 103.3651B[001-006] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B[001-020] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B [002-001] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B[003-027] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B[002-016] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B [003-017] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B[001-004] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [001-024] 103.3651B / 103.3651B [002-009] 103.3651B / 103.3651B[002-008] 103.3651B / 103.3651B[003-008] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [002-011] 103.3651B / 103.3651B[001-011] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B [001-013] 103.3651B / 103.3651B[003-013] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B[002-014] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B[003-018] 103.3651B / 103.3651B [003-028] 103.3651B / 103.3651B[003-029] 103.3651B / 103.3651B [001-029] 103.3651B / 103.3651B [003-001] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B[003-003] 103.3651B / 103.3651B[002-002] 103.3651B / 103.3651B [003-002] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B[002-026] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B[003-006] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B [001-018] 103.3651B / 103.3651B [001-022] 103.3651B / 103.3651B[001-023] 103.3651B / 103.3651B [003-024] 103.3651B / 103.3651B[002-024] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B [001-009] 103.3651B / 103.3651B [003-030] 103.3651B / 103.3651B [003-000] 125.2243B / 103.3681B [001-010] 103.3651B / 103.3651B [003-010] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B[002-027] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B[002-012] 103.3651B / 103.3651B [002-015] 103.3651B / 103.3651B[001-015] 103.3651B / 103.3651B[001-014] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [002-005] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B[001-028] 103.3651B / 103.3651B [002-029] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B[001-025] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B[001-031] 125.2273B / 103.3710B [003-011] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [003-012] 103.3651B / 103.3651B[002-013] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [003-023] 103.3651B / 103.3651B[002-023] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [001-003] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [002-031] 125.2273B / 103.3710B [000-011] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B[000-007] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B[000-017] 103.3651B / 103.3651B [000-008] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-015] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B[000-024] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-020] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-02-06 03:51:34 [2022-02-06 03:51:34,998] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-06 03:51:34,998] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-06 03:51:34,998] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-06 03:51:34,998] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-06 03:51:34,998] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 124] (after 16715 iterations) memory (MB) | allocated: 13257.57958984375 | max allocated: 20721.92822265625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 122] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 4] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 126] (after 16715 iterations) memory (MB) | allocated: 13257.57958984375 | max allocated: 20721.24072265625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 0] (after 16715 iterations) memory (MB) | allocated: 13207.5400390625 | max allocated: 20671.15625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 8] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 24] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 20] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 125] (after 16715 iterations) memory (MB) | allocated: 13258.26513671875 | max allocated: 20722.61376953125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 123] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 64] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 iteration 16715/ 292968 | consumed samples: 34232320 | consumed tokens: 17822154752 | elapsed time per iteration (ms): 225439.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.695995E+00 | loss scale: 65536.0 | grad norm: 65829.463 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.009 | TFLOPs: 57.75 | [Rank 21] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 9] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 5] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 1] (after 16715 iterations) memory (MB) | allocated: 13207.46142578125 | max allocated: 20671.07763671875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 13] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 127] (after 16715 iterations) memory (MB) | allocated: 13258.26513671875 | max allocated: 20722.61376953125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 84] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.83154296875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 100] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 25] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 3] (after 16715 iterations) memory (MB) | allocated: 13207.76171875 | max allocated: 20671.3779296875 | reserved: 24404.0 | max reserved: 24404.0 [Rank 7] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 6] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 16715 iterations) memory (MB) | allocated: 13208.78271484375 | max allocated: 20672.39892578125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 10] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 11] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 35] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 47] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 58] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 53] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 49] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 66] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 71] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 94] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 86] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.6689453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 91] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 98] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 118] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0[Rank 117] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 16715 iterations) memory (MB) | allocated: 10797.29150390625 | max allocated: 16957.47314453125 | reserved: 20072.0 | max reserved: 20072.0 iteration 16716/ 292968 | consumed samples: 34234368 | consumed tokens: 17824169984 | elapsed time per iteration (ms): 154902.3 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.689115E+00 | loss scale: 65536.0 | grad norm: 72280.643 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.05 | iteration 16717/ 292968 | consumed samples: 34236416 | consumed tokens: 17826185216 | elapsed time per iteration (ms): 153395.7 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.673409E+00 | loss scale: 65536.0 | grad norm: 75937.164 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.88 | iteration 16718/ 292968 | consumed samples: 34238464 | consumed tokens: 17828200448 | elapsed time per iteration (ms): 149557.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.664764E+00 | loss scale: 65536.0 | grad norm: 58387.333 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.06 | iteration 16719/ 292968 | consumed samples: 34240512 | consumed tokens: 17830215680 | elapsed time per iteration (ms): 149973.2 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.711737E+00 | loss scale: 65536.0 | grad norm: 60836.654 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.81 | iteration 16720/ 292968 | consumed samples: 34242560 | consumed tokens: 17832230912 | elapsed time per iteration (ms): 150543.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.681549E+00 | loss scale: 65536.0 | grad norm: 85215.237 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.49 | iteration 16721/ 292968 | consumed samples: 34244608 | consumed tokens: 17834246144 | elapsed time per iteration (ms): 149425.1 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.691973E+00 | loss scale: 65536.0 | grad norm: 56684.280 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.13 | iteration 16722/ 292968 | consumed samples: 34246656 | consumed tokens: 17836261376 | elapsed time per iteration (ms): 148960.0 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.686302E+00 | loss scale: 65536.0 | grad norm: 70089.819 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.40 | iteration 16723/ 292968 | consumed samples: 34248704 | consumed tokens: 17838276608 | elapsed time per iteration (ms): 151629.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.694614E+00 | loss scale: 65536.0 | grad norm: 58235.202 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.87 | iteration 16724/ 292968 | consumed samples: 34250752 | consumed tokens: 17840291840 | elapsed time per iteration (ms): 149145.4 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.675303E+00 | loss scale: 65536.0 | grad norm: 63557.647 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.30 | iteration 16725/ 292968 | consumed samples: 34252800 | consumed tokens: 17842307072 | elapsed time per iteration (ms): 148648.9 | learning rate: 5.938E-05 | global batch size: 2048 | lm loss: 2.703145E+00 | loss scale: 65536.0 | grad norm: 60796.854 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.59 | iteration 16726/ 292968 | consumed samples: 34254848 | consumed tokens: 17844322304 | elapsed time per iteration (ms): 150284.2 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.647309E+00 | loss scale: 65536.0 | grad norm: 58684.214 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.63 | iteration 16727/ 292968 | consumed samples: 34256896 | consumed tokens: 17846337536 | elapsed time per iteration (ms): 149282.1 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.688189E+00 | loss scale: 65536.0 | grad norm: 64078.714 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.22 | iteration 16728/ 292968 | consumed samples: 34258944 | consumed tokens: 17848352768 | elapsed time per iteration (ms): 149096.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.676341E+00 | loss scale: 65536.0 | grad norm: 72106.462 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.32 | iteration 16729/ 292968 | consumed samples: 34260992 | consumed tokens: 17850368000 | elapsed time per iteration (ms): 149115.5 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.661130E+00 | loss scale: 65536.0 | grad norm: 74589.654 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.31 | iteration 16730/ 292968 | consumed samples: 34263040 | consumed tokens: 17852383232 | elapsed time per iteration (ms): 149199.3 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.662014E+00 | loss scale: 65536.0 | grad norm: 71387.974 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.26 | iteration 16731/ 292968 | consumed samples: 34265088 | consumed tokens: 17854398464 | elapsed time per iteration (ms): 150790.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.657756E+00 | loss scale: 65536.0 | grad norm: 69623.809 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.34 | iteration 16732/ 292968 | consumed samples: 34267136 | consumed tokens: 17856413696 | elapsed time per iteration (ms): 149443.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.653445E+00 | loss scale: 65536.0 | grad norm: 71629.816 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.12 | iteration 16733/ 292968 | consumed samples: 34269184 | consumed tokens: 17858428928 | elapsed time per iteration (ms): 148885.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.679513E+00 | loss scale: 65536.0 | grad norm: 51383.248 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.45 | iteration 16734/ 292968 | consumed samples: 34271232 | consumed tokens: 17860444160 | elapsed time per iteration (ms): 150851.0 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.693595E+00 | loss scale: 65536.0 | grad norm: 49359.983 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.31 | iteration 16735/ 292968 | consumed samples: 34273280 | consumed tokens: 17862459392 | elapsed time per iteration (ms): 167342.0 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.683416E+00 | loss scale: 65536.0 | grad norm: 65177.526 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 77.80 | iteration 16736/ 292968 | consumed samples: 34275328 | consumed tokens: 17864474624 | elapsed time per iteration (ms): 148403.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.664752E+00 | loss scale: 65536.0 | grad norm: 68647.079 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.73 | iteration 16737/ 292968 | consumed samples: 34277376 | consumed tokens: 17866489856 | elapsed time per iteration (ms): 148993.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.689183E+00 | loss scale: 65536.0 | grad norm: 51730.830 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.39 | iteration 16738/ 292968 | consumed samples: 34279424 | consumed tokens: 17868505088 | elapsed time per iteration (ms): 150751.7 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.676425E+00 | loss scale: 65536.0 | grad norm: 47846.766 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.37 | iteration 16739/ 292968 | consumed samples: 34281472 | consumed tokens: 17870520320 | elapsed time per iteration (ms): 148873.7 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.672023E+00 | loss scale: 65536.0 | grad norm: 58825.173 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.46 | iteration 16740/ 292968 | consumed samples: 34283520 | consumed tokens: 17872535552 | elapsed time per iteration (ms): 149131.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.660465E+00 | loss scale: 65536.0 | grad norm: 74992.488 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.30 | iteration 16741/ 292968 | consumed samples: 34285568 | consumed tokens: 17874550784 | elapsed time per iteration (ms): 149383.3 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.678809E+00 | loss scale: 65536.0 | grad norm: 64346.883 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.16 | iteration 16742/ 292968 | consumed samples: 34287616 | consumed tokens: 17876566016 | elapsed time per iteration (ms): 151318.6 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.692611E+00 | loss scale: 65536.0 | grad norm: 76381.943 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.04 | iteration 16743/ 292968 | consumed samples: 34289664 | consumed tokens: 17878581248 | elapsed time per iteration (ms): 150848.1 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.685162E+00 | loss scale: 65536.0 | grad norm: 52543.066 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.31 | iteration 16744/ 292968 | consumed samples: 34291712 | consumed tokens: 17880596480 | elapsed time per iteration (ms): 149092.6 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.677940E+00 | loss scale: 65536.0 | grad norm: 52287.260 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.33 | iteration 16745/ 292968 | consumed samples: 34293760 | consumed tokens: 17882611712 | elapsed time per iteration (ms): 148930.1 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.698112E+00 | loss scale: 65536.0 | grad norm: 77688.091 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.42 | iteration 16746/ 292968 | consumed samples: 34295808 | consumed tokens: 17884626944 | elapsed time per iteration (ms): 149561.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.658407E+00 | loss scale: 65536.0 | grad norm: 72346.466 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.05 | iteration 16747/ 292968 | consumed samples: 34297856 | consumed tokens: 17886642176 | elapsed time per iteration (ms): 148740.2 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.701191E+00 | loss scale: 65536.0 | grad norm: 57566.900 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.53 | iteration 16748/ 292968 | consumed samples: 34299904 | consumed tokens: 17888657408 | elapsed time per iteration (ms): 148627.3 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.687770E+00 | loss scale: 65536.0 | grad norm: 102331.086 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.60 | iteration 16749/ 292968 | consumed samples: 34301952 | consumed tokens: 17890672640 | elapsed time per iteration (ms): 150645.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.673578E+00 | loss scale: 65536.0 | grad norm: 35988.186 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.43 | iteration 16750/ 292968 | consumed samples: 34304000 | consumed tokens: 17892687872 | elapsed time per iteration (ms): 148452.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.688703E+00 | loss scale: 65536.0 | grad norm: 70698.687 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.70 | saving checkpoint at iteration 16750 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 05:23:28,848] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/mp_rank_00_model_states.pt [2022-02-06 05:23:29,022] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/mp_rank_01_model_states.pt [2022-02-06 05:24:20,465] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 05:24:20,683] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 05:24:20,714] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 05:24:20,790] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 05:24:21,519] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 05:24:21,670] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 05:24:22,075] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 05:24:22,133] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 05:24:22,800] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 05:24:23,133] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 05:24:23,261] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 05:24:23,609] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 05:24:24,189] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 05:24:24,253] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 05:24:24,281] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 05:24:24,416] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 05:24:24,462] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 05:24:24,487] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 05:24:24,584] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 05:24:25,029] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 05:24:25,176] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 05:24:25,178] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 05:24:25,261] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 05:24:25,283] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 05:24:27,796] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 05:24:28,354] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 05:24:28,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 05:24:29,096] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 05:24:29,198] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 05:24:29,515] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 05:24:29,634] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 05:24:29,892] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 05:24:29,931] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 05:24:30,168] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 05:24:30,583] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 05:24:30,883] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 05:24:31,036] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 05:24:31,061] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 05:24:31,081] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 05:24:31,104] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 05:24:31,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 05:24:31,314] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 05:24:31,534] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 05:24:31,625] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 05:24:31,750] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 05:24:31,911] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 05:24:32,033] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 05:24:32,291] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 05:24:32,593] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 05:24:32,593] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 05:24:32,847] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 05:24:33,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 05:24:33,066] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 05:24:33,116] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 05:24:33,140] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 05:24:33,265] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 05:24:33,275] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 05:24:33,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 05:24:33,761] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 05:24:34,868] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 05:24:34,922] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 05:24:35,313] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 05:24:35,326] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 05:24:35,232] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 05:24:35,455] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 05:24:35,522] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 05:24:36,048] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 05:24:36,119] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 05:24:36,143] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 05:24:36,518] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 05:24:36,544] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 05:24:36,565] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 05:24:36,631] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 05:24:36,643] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 05:24:36,912] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 05:24:37,025] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 05:24:37,089] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 05:24:37,354] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 05:24:37,469] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 05:24:37,484] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 05:24:38,390] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 05:24:38,673] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 05:24:38,567] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 05:24:38,568] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 05:24:38,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 05:24:38,686] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 05:24:39,186] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 05:24:39,519] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 05:24:39,586] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 05:24:39,620] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 05:24:39,555] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 05:24:40,095] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 05:24:40,739] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 05:24:40,863] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 05:24:40,931] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 05:24:42,026] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 05:24:42,061] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 05:24:42,145] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 05:24:42,175] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 05:24:42,287] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 05:24:42,299] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 05:24:42,449] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 05:24:42,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 05:24:42,491] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 05:24:42,556] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 05:24:42,878] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 05:24:42,983] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 05:24:43,221] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 05:24:43,374] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 05:24:43,399] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 05:24:43,903] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 05:24:43,977] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 05:24:45,311] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 05:24:45,570] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 05:24:45,609] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 05:24:46,281] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 05:24:46,386] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 05:24:49,034] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 05:24:49,729] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 05:24:50,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 05:24:51,197] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 05:24:51,204] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 05:25:02,404] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 05:25:08,459] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 05:25:15,366] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 05:25:15,425] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 05:25:16,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 05:25:17,077] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16750/zero_pp_rank_0_mp_rank_32_optim_states.pt successfully saved checkpoint at iteration 16750 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 134420.67 iteration 16751/ 292968 | consumed samples: 34306048 | consumed tokens: 17894703104 | elapsed time per iteration (ms): 283177.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.676868E+00 | loss scale: 65536.0 | grad norm: 81128.047 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.007 | TFLOPs: 45.98 | iteration 16752/ 292968 | consumed samples: 34308096 | consumed tokens: 17896718336 | elapsed time per iteration (ms): 148581.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.681266E+00 | loss scale: 65536.0 | grad norm: 49120.034 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.63 | iteration 16753/ 292968 | consumed samples: 34310144 | consumed tokens: 17898733568 | elapsed time per iteration (ms): 150861.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.697527E+00 | loss scale: 65536.0 | grad norm: 57466.661 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.30 | iteration 16754/ 292968 | consumed samples: 34312192 | consumed tokens: 17900748800 | elapsed time per iteration (ms): 148998.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.692980E+00 | loss scale: 65536.0 | grad norm: 87198.838 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.38 | iteration 16755/ 292968 | consumed samples: 34314240 | consumed tokens: 17902764032 | elapsed time per iteration (ms): 148787.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.678875E+00 | loss scale: 65536.0 | grad norm: 80100.917 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.51 | iteration 16756/ 292968 | consumed samples: 34316288 | consumed tokens: 17904779264 | elapsed time per iteration (ms): 148896.3 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.659333E+00 | loss scale: 65536.0 | grad norm: 40743.839 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.44 | iteration 16757/ 292968 | consumed samples: 34318336 | consumed tokens: 17906794496 | elapsed time per iteration (ms): 148562.6 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.676084E+00 | loss scale: 65536.0 | grad norm: 84581.709 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.64 | iteration 16758/ 292968 | consumed samples: 34320384 | consumed tokens: 17908809728 | elapsed time per iteration (ms): 148702.7 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.673822E+00 | loss scale: 65536.0 | grad norm: 76308.981 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.56 | iteration 16759/ 292968 | consumed samples: 34322432 | consumed tokens: 17910824960 | elapsed time per iteration (ms): 149201.1 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.641974E+00 | loss scale: 65536.0 | grad norm: 62734.755 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.26 | iteration 16760/ 292968 | consumed samples: 34324480 | consumed tokens: 17912840192 | elapsed time per iteration (ms): 148777.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.694741E+00 | loss scale: 65536.0 | grad norm: 84779.437 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.51 | iteration 16761/ 292968 | consumed samples: 34326528 | consumed tokens: 17914855424 | elapsed time per iteration (ms): 150000.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.679332E+00 | loss scale: 65536.0 | grad norm: 47916.719 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.80 | iteration 16762/ 292968 | consumed samples: 34328576 | consumed tokens: 17916870656 | elapsed time per iteration (ms): 150047.7 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.659752E+00 | loss scale: 65536.0 | grad norm: 84104.364 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.77 | iteration 16763/ 292968 | consumed samples: 34330624 | consumed tokens: 17918885888 | elapsed time per iteration (ms): 149120.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.681477E+00 | loss scale: 65536.0 | grad norm: 66544.058 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.31 | iteration 16764/ 292968 | consumed samples: 34332672 | consumed tokens: 17920901120 | elapsed time per iteration (ms): 148531.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.686416E+00 | loss scale: 65536.0 | grad norm: 56568.962 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.66 | iteration 16765/ 292968 | consumed samples: 34334720 | consumed tokens: 17922916352 | elapsed time per iteration (ms): 149697.2 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.679560E+00 | loss scale: 65536.0 | grad norm: 58780.769 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.97 | iteration 16766/ 292968 | consumed samples: 34336768 | consumed tokens: 17924931584 | elapsed time per iteration (ms): 149759.2 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.678341E+00 | loss scale: 65536.0 | grad norm: 72975.288 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.94 | iteration 16767/ 292968 | consumed samples: 34338816 | consumed tokens: 17926946816 | elapsed time per iteration (ms): 148890.0 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.656810E+00 | loss scale: 65536.0 | grad norm: 64192.339 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.45 | iteration 16768/ 292968 | consumed samples: 34340864 | consumed tokens: 17928962048 | elapsed time per iteration (ms): 149475.7 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.686870E+00 | loss scale: 65536.0 | grad norm: 87375.283 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.10 | iteration 16769/ 292968 | consumed samples: 34342912 | consumed tokens: 17930977280 | elapsed time per iteration (ms): 149183.5 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.674013E+00 | loss scale: 65536.0 | grad norm: 54473.692 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.27 | iteration 16770/ 292968 | consumed samples: 34344960 | consumed tokens: 17932992512 | elapsed time per iteration (ms): 149572.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.692381E+00 | loss scale: 65536.0 | grad norm: 64770.978 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.05 | iteration 16771/ 292968 | consumed samples: 34347008 | consumed tokens: 17935007744 | elapsed time per iteration (ms): 148551.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.666720E+00 | loss scale: 65536.0 | grad norm: 74764.918 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.64 | iteration 16772/ 292968 | consumed samples: 34349056 | consumed tokens: 17937022976 | elapsed time per iteration (ms): 148760.7 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.682603E+00 | loss scale: 65536.0 | grad norm: 59799.975 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.52 | iteration 16773/ 292968 | consumed samples: 34351104 | consumed tokens: 17939038208 | elapsed time per iteration (ms): 150609.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.678465E+00 | loss scale: 65536.0 | grad norm: 70657.294 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.45 | iteration 16774/ 292968 | consumed samples: 34353152 | consumed tokens: 17941053440 | elapsed time per iteration (ms): 149381.3 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.669746E+00 | loss scale: 65536.0 | grad norm: 82474.725 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.16 | iteration 16775/ 292968 | consumed samples: 34355200 | consumed tokens: 17943068672 | elapsed time per iteration (ms): 149123.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.673733E+00 | loss scale: 65536.0 | grad norm: 50678.251 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.31 | iteration 16776/ 292968 | consumed samples: 34357248 | consumed tokens: 17945083904 | elapsed time per iteration (ms): 150855.0 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.661626E+00 | loss scale: 65536.0 | grad norm: 63282.787 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.31 | iteration 16777/ 292968 | consumed samples: 34359296 | consumed tokens: 17947099136 | elapsed time per iteration (ms): 149096.6 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.681395E+00 | loss scale: 65536.0 | grad norm: 92740.539 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.32 | iteration 16778/ 292968 | consumed samples: 34361344 | consumed tokens: 17949114368 | elapsed time per iteration (ms): 148413.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.669931E+00 | loss scale: 65536.0 | grad norm: 58366.180 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.73 | iteration 16779/ 292968 | consumed samples: 34363392 | consumed tokens: 17951129600 | elapsed time per iteration (ms): 149008.3 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.693433E+00 | loss scale: 65536.0 | grad norm: 59015.094 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.38 | iteration 16780/ 292968 | consumed samples: 34365440 | consumed tokens: 17953144832 | elapsed time per iteration (ms): 149171.1 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.704262E+00 | loss scale: 65536.0 | grad norm: 70810.490 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.28 | iteration 16781/ 292968 | consumed samples: 34367488 | consumed tokens: 17955160064 | elapsed time per iteration (ms): 149591.0 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.703644E+00 | loss scale: 65536.0 | grad norm: 80160.210 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.04 | iteration 16782/ 292968 | consumed samples: 34369536 | consumed tokens: 17957175296 | elapsed time per iteration (ms): 149787.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.688220E+00 | loss scale: 65536.0 | grad norm: 57685.734 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.92 | iteration 16783/ 292968 | consumed samples: 34371584 | consumed tokens: 17959190528 | elapsed time per iteration (ms): 149801.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.689552E+00 | loss scale: 65536.0 | grad norm: 84943.076 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.91 | iteration 16784/ 292968 | consumed samples: 34373632 | consumed tokens: 17961205760 | elapsed time per iteration (ms): 149222.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.686352E+00 | loss scale: 65536.0 | grad norm: 70022.337 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.25 | iteration 16785/ 292968 | consumed samples: 34375680 | consumed tokens: 17963220992 | elapsed time per iteration (ms): 148941.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.715951E+00 | loss scale: 65536.0 | grad norm: 74726.232 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.42 | iteration 16786/ 292968 | consumed samples: 34377728 | consumed tokens: 17965236224 | elapsed time per iteration (ms): 148500.5 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.695132E+00 | loss scale: 65536.0 | grad norm: 59262.316 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.68 | iteration 16787/ 292968 | consumed samples: 34379776 | consumed tokens: 17967251456 | elapsed time per iteration (ms): 149471.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.711068E+00 | loss scale: 65536.0 | grad norm: 70143.443 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16788/ 292968 | consumed samples: 34381824 | consumed tokens: 17969266688 | elapsed time per iteration (ms): 149137.8 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.684824E+00 | loss scale: 65536.0 | grad norm: 82602.639 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.30 | iteration 16789/ 292968 | consumed samples: 34383872 | consumed tokens: 17971281920 | elapsed time per iteration (ms): 149642.5 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.656608E+00 | loss scale: 65536.0 | grad norm: 43810.235 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.01 | iteration 16790/ 292968 | consumed samples: 34385920 | consumed tokens: 17973297152 | elapsed time per iteration (ms): 150567.3 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.691111E+00 | loss scale: 65536.0 | grad norm: 56035.856 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.47 | iteration 16791/ 292968 | consumed samples: 34387968 | consumed tokens: 17975312384 | elapsed time per iteration (ms): 148742.4 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.700163E+00 | loss scale: 65536.0 | grad norm: 73953.672 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.53 | iteration 16792/ 292968 | consumed samples: 34390016 | consumed tokens: 17977327616 | elapsed time per iteration (ms): 148835.5 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.684481E+00 | loss scale: 65536.0 | grad norm: 56978.035 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.48 | iteration 16793/ 292968 | consumed samples: 34392064 | consumed tokens: 17979342848 | elapsed time per iteration (ms): 149007.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.668318E+00 | loss scale: 65536.0 | grad norm: 53870.873 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.38 | iteration 16794/ 292968 | consumed samples: 34394112 | consumed tokens: 17981358080 | elapsed time per iteration (ms): 148791.6 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.673180E+00 | loss scale: 65536.0 | grad norm: 46169.142 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.50 | iteration 16795/ 292968 | consumed samples: 34396160 | consumed tokens: 17983373312 | elapsed time per iteration (ms): 149761.9 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.684832E+00 | loss scale: 65536.0 | grad norm: 48769.583 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.94 | iteration 16796/ 292968 | consumed samples: 34398208 | consumed tokens: 17985388544 | elapsed time per iteration (ms): 148845.1 | learning rate: 5.937E-05 | global batch size: 2048 | lm loss: 2.687012E+00 | loss scale: 65536.0 | grad norm: 63173.809 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.47 | iteration 16797/ 292968 | consumed samples: 34400256 | consumed tokens: 17987403776 | elapsed time per iteration (ms): 150355.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.685807E+00 | loss scale: 65536.0 | grad norm: 84019.608 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.59 | iteration 16798/ 292968 | consumed samples: 34402304 | consumed tokens: 17989419008 | elapsed time per iteration (ms): 150594.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.664490E+00 | loss scale: 65536.0 | grad norm: 48809.713 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.46 | iteration 16799/ 292968 | consumed samples: 34404352 | consumed tokens: 17991434240 | elapsed time per iteration (ms): 148713.8 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.690927E+00 | loss scale: 65536.0 | grad norm: 60026.131 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.55 | iteration 16800/ 292968 | consumed samples: 34406400 | consumed tokens: 17993449472 | elapsed time per iteration (ms): 152517.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.668995E+00 | loss scale: 65536.0 | grad norm: 95831.381 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 85.37 | ------------------------------------------------------------------------------------------- valid loss at iteration 16800 | lm loss value: 3.090558E+00 | lm loss PPL: 2.198934E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 16800 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 07:37:25,113] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/mp_rank_00_model_states.pt [2022-02-06 07:37:25,459] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/mp_rank_01_model_states.pt [2022-02-06 07:38:51,902] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 07:38:51,961] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 07:38:52,316] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 07:38:52,543] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 07:38:53,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 07:38:54,032] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 07:38:54,136] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 07:38:54,293] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 07:38:54,442] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 07:38:54,444] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 07:38:54,701] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 07:38:55,230] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 07:38:55,314] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 07:38:55,327] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 07:38:55,398] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 07:38:55,412] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 07:38:55,642] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 07:38:55,787] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 07:38:56,004] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 07:38:56,238] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 07:38:56,395] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 07:38:56,527] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 07:38:56,610] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 07:38:56,616] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 07:38:56,962] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 07:38:57,359] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 07:38:57,717] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 07:38:58,392] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 07:39:00,709] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 07:39:00,735] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 07:39:02,190] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 07:39:02,569] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 07:39:02,606] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 07:39:02,921] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 07:39:03,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 07:39:03,505] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 07:39:03,641] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 07:39:03,874] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 07:39:03,933] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 07:39:04,170] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 07:39:04,257] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 07:39:04,424] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 07:39:04,557] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 07:39:04,650] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 07:39:04,759] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 07:39:04,779] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 07:39:04,900] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 07:39:04,968] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 07:39:05,058] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 07:39:05,093] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 07:39:05,091] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 07:39:05,133] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 07:39:05,228] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 07:39:05,230] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 07:39:05,361] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 07:39:05,491] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 07:39:05,554] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 07:39:05,650] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 07:39:05,737] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 07:39:05,860] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 07:39:05,808] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 07:39:05,953] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 07:39:06,000] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 07:39:05,956] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 07:39:06,136] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 07:39:06,163] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 07:39:06,467] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 07:39:06,681] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 07:39:06,876] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 07:39:07,154] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 07:39:07,271] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 07:39:07,344] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 07:39:07,436] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 07:39:07,869] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 07:39:08,606] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 07:39:08,671] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 07:39:08,858] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 07:39:08,896] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 07:39:09,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 07:39:08,913] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 07:39:09,005] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 07:39:09,135] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 07:39:09,477] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 07:39:09,515] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 07:39:09,824] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 07:39:10,192] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 07:39:10,179] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 07:39:10,423] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 07:39:10,428] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 07:39:10,538] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 07:39:10,647] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 07:39:10,717] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 07:39:11,036] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 07:39:11,254] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 07:39:11,385] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 07:39:11,430] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 07:39:11,595] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 07:39:11,667] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 07:39:11,742] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 07:39:11,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 07:39:12,943] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 07:39:13,448] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 07:39:13,479] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 07:39:13,524] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 07:39:13,585] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 07:39:13,678] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 07:39:15,006] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 07:39:15,053] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 07:39:15,628] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 07:39:15,663] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 07:39:15,823] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 07:39:15,923] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 07:39:15,997] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 07:39:16,035] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 07:39:16,316] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 07:39:17,575] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 07:39:17,774] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 07:39:19,245] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 07:39:19,443] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 07:39:20,173] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 07:39:20,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 07:39:20,803] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 07:39:20,825] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 07:39:22,321] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 07:39:22,498] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 07:39:25,435] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 07:39:29,922] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 07:39:33,451] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16800/zero_pp_rank_0_mp_rank_38_optim_states.pt successfully saved checkpoint at iteration 16800 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 131008.03 iteration 16801/ 292968 | consumed samples: 34408448 | consumed tokens: 17995464704 | elapsed time per iteration (ms): 749443.1 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.681278E+00 | loss scale: 65536.0 | grad norm: 54335.538 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 17.37 | iteration 16802/ 292968 | consumed samples: 34410496 | consumed tokens: 17997479936 | elapsed time per iteration (ms): 155514.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.684608E+00 | loss scale: 65536.0 | grad norm: 69047.191 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.72 | iteration 16803/ 292968 | consumed samples: 34412544 | consumed tokens: 17999495168 | elapsed time per iteration (ms): 155925.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.691563E+00 | loss scale: 65536.0 | grad norm: 103153.481 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.50 | iteration 16804/ 292968 | consumed samples: 34414592 | consumed tokens: 18001510400 | elapsed time per iteration (ms): 153416.8 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.683760E+00 | loss scale: 65536.0 | grad norm: 48444.067 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.87 | iteration 16805/ 292968 | consumed samples: 34416640 | consumed tokens: 18003525632 | elapsed time per iteration (ms): 153660.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.702456E+00 | loss scale: 65536.0 | grad norm: 125194.707 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.73 | iteration 16806/ 292968 | consumed samples: 34418688 | consumed tokens: 18005540864 | elapsed time per iteration (ms): 151255.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.677657E+00 | loss scale: 65536.0 | grad norm: 55952.916 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.08 | iteration 16807/ 292968 | consumed samples: 34420736 | consumed tokens: 18007556096 | elapsed time per iteration (ms): 151025.1 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.711150E+00 | loss scale: 65536.0 | grad norm: 95950.547 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.21 | iteration 16808/ 292968 | consumed samples: 34422784 | consumed tokens: 18009571328 | elapsed time per iteration (ms): 149805.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.681709E+00 | loss scale: 65536.0 | grad norm: 60945.916 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.91 | iteration 16809/ 292968 | consumed samples: 34424832 | consumed tokens: 18011586560 | elapsed time per iteration (ms): 150778.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.711612E+00 | loss scale: 65536.0 | grad norm: 70210.794 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.35 | iteration 16810/ 292968 | consumed samples: 34426880 | consumed tokens: 18013601792 | elapsed time per iteration (ms): 150522.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.680050E+00 | loss scale: 65536.0 | grad norm: 65941.987 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.50 | iteration 16811/ 292968 | consumed samples: 34428928 | consumed tokens: 18015617024 | elapsed time per iteration (ms): 152820.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.706586E+00 | loss scale: 65536.0 | grad norm: 78640.126 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 85.20 | iteration 16812/ 292968 | consumed samples: 34430976 | consumed tokens: 18017632256 | elapsed time per iteration (ms): 151592.4 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.662695E+00 | loss scale: 65536.0 | grad norm: 69792.855 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 85.89 | iteration 16813/ 292968 | consumed samples: 34433024 | consumed tokens: 18019647488 | elapsed time per iteration (ms): 149559.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.689743E+00 | loss scale: 65536.0 | grad norm: 62862.987 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.05 | iteration 16814/ 292968 | consumed samples: 34435072 | consumed tokens: 18021662720 | elapsed time per iteration (ms): 149597.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.672374E+00 | loss scale: 65536.0 | grad norm: 67595.123 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.03 | iteration 16815/ 292968 | consumed samples: 34437120 | consumed tokens: 18023677952 | elapsed time per iteration (ms): 149534.4 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.707298E+00 | loss scale: 65536.0 | grad norm: 58169.667 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.07 | iteration 16816/ 292968 | consumed samples: 34439168 | consumed tokens: 18025693184 | elapsed time per iteration (ms): 149654.4 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.704958E+00 | loss scale: 65536.0 | grad norm: 58331.039 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.00 | iteration 16817/ 292968 | consumed samples: 34441216 | consumed tokens: 18027708416 | elapsed time per iteration (ms): 149549.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.664627E+00 | loss scale: 65536.0 | grad norm: 62834.467 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.06 | iteration 16818/ 292968 | consumed samples: 34443264 | consumed tokens: 18029723648 | elapsed time per iteration (ms): 150363.8 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.708677E+00 | loss scale: 65536.0 | grad norm: 86804.834 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.59 | iteration 16819/ 292968 | consumed samples: 34445312 | consumed tokens: 18031738880 | elapsed time per iteration (ms): 151278.2 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.680396E+00 | loss scale: 65536.0 | grad norm: 61043.131 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.07 | iteration 16820/ 292968 | consumed samples: 34447360 | consumed tokens: 18033754112 | elapsed time per iteration (ms): 150481.1 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.691410E+00 | loss scale: 65536.0 | grad norm: 115110.676 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.52 | iteration 16821/ 292968 | consumed samples: 34449408 | consumed tokens: 18035769344 | elapsed time per iteration (ms): 150666.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.685491E+00 | loss scale: 65536.0 | grad norm: 52302.076 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.41 | iteration 16822/ 292968 | consumed samples: 34451456 | consumed tokens: 18037784576 | elapsed time per iteration (ms): 150626.1 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.681656E+00 | loss scale: 65536.0 | grad norm: 143318.187 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.44 | iteration 16823/ 292968 | consumed samples: 34453504 | consumed tokens: 18039799808 | elapsed time per iteration (ms): 150031.8 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.701643E+00 | loss scale: 65536.0 | grad norm: 72249.426 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.78 | iteration 16824/ 292968 | consumed samples: 34455552 | consumed tokens: 18041815040 | elapsed time per iteration (ms): 150258.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.693412E+00 | loss scale: 65536.0 | grad norm: 139406.389 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.65 | iteration 16825/ 292968 | consumed samples: 34457600 | consumed tokens: 18043830272 | elapsed time per iteration (ms): 149306.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.717791E+00 | loss scale: 65536.0 | grad norm: 122857.799 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.20 | iteration 16826/ 292968 | consumed samples: 34459648 | consumed tokens: 18045845504 | elapsed time per iteration (ms): 149619.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.713517E+00 | loss scale: 65536.0 | grad norm: 98376.174 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.02 | iteration 16827/ 292968 | consumed samples: 34461696 | consumed tokens: 18047860736 | elapsed time per iteration (ms): 149101.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.707080E+00 | loss scale: 65536.0 | grad norm: 72892.136 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.32 | iteration 16828/ 292968 | consumed samples: 34463744 | consumed tokens: 18049875968 | elapsed time per iteration (ms): 149085.2 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.700552E+00 | loss scale: 65536.0 | grad norm: 84452.871 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.33 | iteration 16829/ 292968 | consumed samples: 34465792 | consumed tokens: 18051891200 | elapsed time per iteration (ms): 149044.8 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.705681E+00 | loss scale: 65536.0 | grad norm: 62250.861 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.35 | iteration 16830/ 292968 | consumed samples: 34467840 | consumed tokens: 18053906432 | elapsed time per iteration (ms): 148767.1 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.696596E+00 | loss scale: 65536.0 | grad norm: 77939.298 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.52 | iteration 16831/ 292968 | consumed samples: 34469888 | consumed tokens: 18055921664 | elapsed time per iteration (ms): 148928.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.691169E+00 | loss scale: 65536.0 | grad norm: 46157.408 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.42 | iteration 16832/ 292968 | consumed samples: 34471936 | consumed tokens: 18057936896 | elapsed time per iteration (ms): 148888.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.678659E+00 | loss scale: 65536.0 | grad norm: 69337.468 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.45 | iteration 16833/ 292968 | consumed samples: 34473984 | consumed tokens: 18059952128 | elapsed time per iteration (ms): 148986.2 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.685564E+00 | loss scale: 65536.0 | grad norm: 90934.913 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.39 | iteration 16834/ 292968 | consumed samples: 34476032 | consumed tokens: 18061967360 | elapsed time per iteration (ms): 149014.2 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.694111E+00 | loss scale: 65536.0 | grad norm: 56748.909 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.37 | iteration 16835/ 292968 | consumed samples: 34478080 | consumed tokens: 18063982592 | elapsed time per iteration (ms): 148992.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.688596E+00 | loss scale: 65536.0 | grad norm: 47853.670 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.39 | iteration 16836/ 292968 | consumed samples: 34480128 | consumed tokens: 18065997824 | elapsed time per iteration (ms): 148503.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.672335E+00 | loss scale: 65536.0 | grad norm: 62843.496 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.67 | iteration 16837/ 292968 | consumed samples: 34482176 | consumed tokens: 18068013056 | elapsed time per iteration (ms): 149931.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.692216E+00 | loss scale: 65536.0 | grad norm: 84548.901 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.84 | iteration 16838/ 292968 | consumed samples: 34484224 | consumed tokens: 18070028288 | elapsed time per iteration (ms): 148833.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.686199E+00 | loss scale: 65536.0 | grad norm: 68583.328 | num zeros: 0.0 | curriculum seqlen: 984 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.48 | iteration 16839/ 292968 | consumed samples: 34486272 | consumed tokens: 18072059904 | elapsed time per iteration (ms): 149032.3 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.694147E+00 | loss scale: 65536.0 | grad norm: 87801.642 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.07 | iteration 16840/ 292968 | consumed samples: 34488320 | consumed tokens: 18074091520 | elapsed time per iteration (ms): 148973.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.676204E+00 | loss scale: 65536.0 | grad norm: 54271.984 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.11 | iteration 16841/ 292968 | consumed samples: 34490368 | consumed tokens: 18076123136 | elapsed time per iteration (ms): 150039.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.716986E+00 | loss scale: 65536.0 | grad norm: 95999.938 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.48 | iteration 16842/ 292968 | consumed samples: 34492416 | consumed tokens: 18078154752 | elapsed time per iteration (ms): 149416.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.673957E+00 | loss scale: 65536.0 | grad norm: 49012.687 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.85 | iteration 16843/ 292968 | consumed samples: 34494464 | consumed tokens: 18080186368 | elapsed time per iteration (ms): 149944.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.685158E+00 | loss scale: 65536.0 | grad norm: 73880.311 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.54 | iteration 16844/ 292968 | consumed samples: 34496512 | consumed tokens: 18082217984 | elapsed time per iteration (ms): 155146.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.702117E+00 | loss scale: 65536.0 | grad norm: 63830.340 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.60 | iteration 16845/ 292968 | consumed samples: 34498560 | consumed tokens: 18084249600 | elapsed time per iteration (ms): 149307.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.669757E+00 | loss scale: 65536.0 | grad norm: 66966.068 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.91 | iteration 16846/ 292968 | consumed samples: 34500608 | consumed tokens: 18086281216 | elapsed time per iteration (ms): 149936.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.692608E+00 | loss scale: 65536.0 | grad norm: 62815.831 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.54 | iteration 16847/ 292968 | consumed samples: 34502656 | consumed tokens: 18088312832 | elapsed time per iteration (ms): 149654.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.700192E+00 | loss scale: 65536.0 | grad norm: 59883.268 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.71 | iteration 16848/ 292968 | consumed samples: 34504704 | consumed tokens: 18090344448 | elapsed time per iteration (ms): 149586.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.683215E+00 | loss scale: 65536.0 | grad norm: 78038.844 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.75 | iteration 16849/ 292968 | consumed samples: 34506752 | consumed tokens: 18092376064 | elapsed time per iteration (ms): 149091.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.679176E+00 | loss scale: 65536.0 | grad norm: 68234.927 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.04 | iteration 16850/ 292968 | consumed samples: 34508800 | consumed tokens: 18094407680 | elapsed time per iteration (ms): 150318.9 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.700952E+00 | loss scale: 65536.0 | grad norm: 90241.598 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.32 | saving checkpoint at iteration 16850 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 09:45:27,772] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/mp_rank_01_model_states.pt [2022-02-06 09:45:27,789] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/mp_rank_00_model_states.pt [2022-02-06 09:45:53,007] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 09:45:53,257] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 09:45:53,611] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 09:45:54,191] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 09:45:54,318] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 09:45:54,847] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 09:45:55,260] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 09:45:55,543] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 09:45:55,870] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 09:45:55,895] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 09:45:55,920] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 09:45:55,991] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 09:45:56,461] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 09:45:56,731] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 09:45:56,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 09:45:56,794] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 09:45:56,835] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 09:45:56,893] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 09:45:57,365] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 09:45:57,792] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 09:45:58,061] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 09:45:58,092] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 09:45:58,130] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 09:45:58,154] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 09:45:59,501] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 09:45:59,588] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 09:45:59,724] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 09:45:59,962] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 09:46:00,642] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 09:46:01,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 09:46:01,560] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 09:46:01,854] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 09:46:02,106] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 09:46:02,176] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 09:46:02,258] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 09:46:02,306] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 09:46:02,350] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 09:46:03,023] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 09:46:03,044] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 09:46:03,084] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 09:46:03,120] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 09:46:03,190] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 09:46:03,852] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 09:46:04,013] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 09:46:04,422] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 09:46:04,750] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 09:46:05,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 09:46:05,607] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 09:46:05,747] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 09:46:06,253] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 09:46:06,401] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 09:46:06,481] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 09:46:06,548] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 09:46:06,601] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 09:46:06,709] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 09:46:07,479] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 09:46:08,293] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 09:46:08,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 09:46:08,746] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 09:46:08,850] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 09:46:08,861] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 09:46:08,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 09:46:08,899] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 09:46:09,055] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 09:46:09,104] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 09:46:09,201] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 09:46:09,299] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 09:46:09,358] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 09:46:09,456] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 09:46:09,523] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 09:46:09,461] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 09:46:09,644] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 09:46:09,813] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 09:46:09,736] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 09:46:09,995] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 09:46:10,148] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 09:46:10,345] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 09:46:10,393] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 09:46:10,449] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 09:46:10,437] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 09:46:10,486] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 09:46:10,469] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 09:46:10,509] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 09:46:10,584] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 09:46:10,629] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 09:46:10,678] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 09:46:10,960] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 09:46:11,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 09:46:11,144] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 09:46:11,193] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 09:46:11,122] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 09:46:11,454] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 09:46:11,493] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 09:46:11,508] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 09:46:11,764] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 09:46:11,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 09:46:12,604] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 09:46:12,612] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 09:46:12,658] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 09:46:13,139] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 09:46:13,583] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 09:46:13,617] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 09:46:13,639] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 09:46:13,735] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 09:46:14,092] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 09:46:14,626] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 09:46:14,638] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 09:46:14,661] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 09:46:15,292] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 09:46:15,295] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 09:46:15,812] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 09:46:15,965] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 09:46:16,269] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 09:46:16,571] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 09:46:16,595] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 09:46:16,810] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 09:46:16,883] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 09:46:17,121] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 09:46:18,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 09:46:18,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 09:46:18,351] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 09:46:19,326] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 09:46:21,445] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 09:46:21,491] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 09:46:21,777] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 09:46:21,835] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 09:46:21,972] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 09:46:21,974] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16850/zero_pp_rank_0_mp_rank_02_optim_states.pt successfully saved checkpoint at iteration 16850 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 81870.53 iteration 16851/ 292968 | consumed samples: 34510848 | consumed tokens: 18096439296 | elapsed time per iteration (ms): 230806.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.701359E+00 | loss scale: 65536.0 | grad norm: 67984.778 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.009 | TFLOPs: 56.87 | iteration 16852/ 292968 | consumed samples: 34512896 | consumed tokens: 18098470912 | elapsed time per iteration (ms): 149189.4 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.696858E+00 | loss scale: 65536.0 | grad norm: 62924.953 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.98 | iteration 16853/ 292968 | consumed samples: 34514944 | consumed tokens: 18100502528 | elapsed time per iteration (ms): 149074.2 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.729046E+00 | loss scale: 65536.0 | grad norm: 70135.087 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.05 | iteration 16854/ 292968 | consumed samples: 34516992 | consumed tokens: 18102534144 | elapsed time per iteration (ms): 148830.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.717540E+00 | loss scale: 65536.0 | grad norm: 54172.944 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.19 | iteration 16855/ 292968 | consumed samples: 34519040 | consumed tokens: 18104565760 | elapsed time per iteration (ms): 149694.8 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.730820E+00 | loss scale: 65536.0 | grad norm: 71509.432 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.68 | iteration 16856/ 292968 | consumed samples: 34521088 | consumed tokens: 18106597376 | elapsed time per iteration (ms): 149776.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.697764E+00 | loss scale: 65536.0 | grad norm: 87200.709 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.63 | iteration 16857/ 292968 | consumed samples: 34523136 | consumed tokens: 18108628992 | elapsed time per iteration (ms): 149191.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.692329E+00 | loss scale: 65536.0 | grad norm: 56838.606 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.98 | iteration 16858/ 292968 | consumed samples: 34525184 | consumed tokens: 18110660608 | elapsed time per iteration (ms): 149251.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.691914E+00 | loss scale: 65536.0 | grad norm: 79451.331 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.94 | iteration 16859/ 292968 | consumed samples: 34527232 | consumed tokens: 18112692224 | elapsed time per iteration (ms): 149594.5 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.688902E+00 | loss scale: 65536.0 | grad norm: 61173.307 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.74 | iteration 16860/ 292968 | consumed samples: 34529280 | consumed tokens: 18114723840 | elapsed time per iteration (ms): 149861.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.696067E+00 | loss scale: 65536.0 | grad norm: 60471.523 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.59 | iteration 16861/ 292968 | consumed samples: 34531328 | consumed tokens: 18116755456 | elapsed time per iteration (ms): 149626.1 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.675402E+00 | loss scale: 65536.0 | grad norm: 59672.490 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.72 | iteration 16862/ 292968 | consumed samples: 34533376 | consumed tokens: 18118787072 | elapsed time per iteration (ms): 149708.0 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.689071E+00 | loss scale: 65536.0 | grad norm: 65053.462 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.67 | iteration 16863/ 292968 | consumed samples: 34535424 | consumed tokens: 18120818688 | elapsed time per iteration (ms): 149161.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.676798E+00 | loss scale: 65536.0 | grad norm: 64082.192 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.00 | iteration 16864/ 292968 | consumed samples: 34537472 | consumed tokens: 18122850304 | elapsed time per iteration (ms): 149158.7 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.684850E+00 | loss scale: 65536.0 | grad norm: 84990.097 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.00 | iteration 16865/ 292968 | consumed samples: 34539520 | consumed tokens: 18124881920 | elapsed time per iteration (ms): 149302.6 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.695620E+00 | loss scale: 65536.0 | grad norm: 70036.687 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.91 | iteration 16866/ 292968 | consumed samples: 34541568 | consumed tokens: 18126913536 | elapsed time per iteration (ms): 149411.2 | learning rate: 5.936E-05 | global batch size: 2048 | lm loss: 2.705672E+00 | loss scale: 65536.0 | grad norm: 44190.965 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.85 | iteration 16867/ 292968 | consumed samples: 34543616 | consumed tokens: 18128945152 | elapsed time per iteration (ms): 155241.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.704927E+00 | loss scale: 65536.0 | grad norm: 57993.091 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.55 | iteration 16868/ 292968 | consumed samples: 34545664 | consumed tokens: 18130976768 | elapsed time per iteration (ms): 149379.6 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.704222E+00 | loss scale: 65536.0 | grad norm: 81291.594 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.87 | iteration 16869/ 292968 | consumed samples: 34547712 | consumed tokens: 18133008384 | elapsed time per iteration (ms): 149026.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.678150E+00 | loss scale: 65536.0 | grad norm: 50333.919 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.08 | iteration 16870/ 292968 | consumed samples: 34549760 | consumed tokens: 18135040000 | elapsed time per iteration (ms): 149341.0 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.692966E+00 | loss scale: 65536.0 | grad norm: 65192.061 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.89 | iteration 16871/ 292968 | consumed samples: 34551808 | consumed tokens: 18137071616 | elapsed time per iteration (ms): 148954.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.705772E+00 | loss scale: 65536.0 | grad norm: 68710.188 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.12 | iteration 16872/ 292968 | consumed samples: 34553856 | consumed tokens: 18139103232 | elapsed time per iteration (ms): 149127.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.664180E+00 | loss scale: 65536.0 | grad norm: 62560.137 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.02 | iteration 16873/ 292968 | consumed samples: 34555904 | consumed tokens: 18141134848 | elapsed time per iteration (ms): 149643.3 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.683766E+00 | loss scale: 65536.0 | grad norm: 96028.495 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.71 | iteration 16874/ 292968 | consumed samples: 34557952 | consumed tokens: 18143166464 | elapsed time per iteration (ms): 149116.3 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.677366E+00 | loss scale: 65536.0 | grad norm: 46896.647 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.02 | iteration 16875/ 292968 | consumed samples: 34560000 | consumed tokens: 18145198080 | elapsed time per iteration (ms): 149874.3 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.667748E+00 | loss scale: 65536.0 | grad norm: 97861.768 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.58 | iteration 16876/ 292968 | consumed samples: 34562048 | consumed tokens: 18147229696 | elapsed time per iteration (ms): 150011.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.703907E+00 | loss scale: 65536.0 | grad norm: 58506.412 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.50 | iteration 16877/ 292968 | consumed samples: 34564096 | consumed tokens: 18149261312 | elapsed time per iteration (ms): 149956.5 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.675104E+00 | loss scale: 65536.0 | grad norm: 74413.953 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.53 | iteration 16878/ 292968 | consumed samples: 34566144 | consumed tokens: 18151292928 | elapsed time per iteration (ms): 149387.7 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.706047E+00 | loss scale: 65536.0 | grad norm: 63101.002 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.86 | iteration 16879/ 292968 | consumed samples: 34568192 | consumed tokens: 18153324544 | elapsed time per iteration (ms): 149356.6 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.699156E+00 | loss scale: 65536.0 | grad norm: 80855.091 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.88 | iteration 16880/ 292968 | consumed samples: 34570240 | consumed tokens: 18155356160 | elapsed time per iteration (ms): 149400.3 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.702178E+00 | loss scale: 65536.0 | grad norm: 69333.481 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.86 | iteration 16881/ 292968 | consumed samples: 34572288 | consumed tokens: 18157387776 | elapsed time per iteration (ms): 149290.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.716751E+00 | loss scale: 65536.0 | grad norm: 101426.859 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.92 | iteration 16882/ 292968 | consumed samples: 34574336 | consumed tokens: 18159419392 | elapsed time per iteration (ms): 150328.7 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.668438E+00 | loss scale: 65536.0 | grad norm: 49056.763 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.31 | iteration 16883/ 292968 | consumed samples: 34576384 | consumed tokens: 18161451008 | elapsed time per iteration (ms): 149260.0 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.718268E+00 | loss scale: 65536.0 | grad norm: 97164.187 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.94 | iteration 16884/ 292968 | consumed samples: 34578432 | consumed tokens: 18163482624 | elapsed time per iteration (ms): 149050.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.686927E+00 | loss scale: 65536.0 | grad norm: 75395.159 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.06 | iteration 16885/ 292968 | consumed samples: 34580480 | consumed tokens: 18165514240 | elapsed time per iteration (ms): 149780.1 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.699965E+00 | loss scale: 65536.0 | grad norm: 67542.285 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.63 | iteration 16886/ 292968 | consumed samples: 34582528 | consumed tokens: 18167545856 | elapsed time per iteration (ms): 149608.2 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.683887E+00 | loss scale: 65536.0 | grad norm: 86444.334 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.73 | iteration 16887/ 292968 | consumed samples: 34584576 | consumed tokens: 18169577472 | elapsed time per iteration (ms): 149101.1 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.712900E+00 | loss scale: 65536.0 | grad norm: 52466.112 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.03 | iteration 16888/ 292968 | consumed samples: 34586624 | consumed tokens: 18171609088 | elapsed time per iteration (ms): 149303.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.690523E+00 | loss scale: 65536.0 | grad norm: 77866.865 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.91 | iteration 16889/ 292968 | consumed samples: 34588672 | consumed tokens: 18173640704 | elapsed time per iteration (ms): 149675.6 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.710894E+00 | loss scale: 65536.0 | grad norm: 72046.235 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.69 | iteration 16890/ 292968 | consumed samples: 34590720 | consumed tokens: 18175672320 | elapsed time per iteration (ms): 150045.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.735796E+00 | loss scale: 65536.0 | grad norm: 85867.259 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.48 | iteration 16891/ 292968 | consumed samples: 34592768 | consumed tokens: 18177703936 | elapsed time per iteration (ms): 151399.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.681190E+00 | loss scale: 65536.0 | grad norm: 71213.624 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.70 | iteration 16892/ 292968 | consumed samples: 34594816 | consumed tokens: 18179735552 | elapsed time per iteration (ms): 150614.2 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.702312E+00 | loss scale: 65536.0 | grad norm: 61816.627 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.15 | iteration 16893/ 292968 | consumed samples: 34596864 | consumed tokens: 18181767168 | elapsed time per iteration (ms): 150576.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.702690E+00 | loss scale: 65536.0 | grad norm: 80877.004 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.17 | iteration 16894/ 292968 | consumed samples: 34598912 | consumed tokens: 18183798784 | elapsed time per iteration (ms): 149034.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.701026E+00 | loss scale: 65536.0 | grad norm: 60669.161 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.07 | iteration 16895/ 292968 | consumed samples: 34600960 | consumed tokens: 18185830400 | elapsed time per iteration (ms): 149874.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.701796E+00 | loss scale: 65536.0 | grad norm: 79484.942 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.58 | iteration 16896/ 292968 | consumed samples: 34603008 | consumed tokens: 18187862016 | elapsed time per iteration (ms): 149084.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.694329E+00 | loss scale: 65536.0 | grad norm: 58291.998 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.04 | iteration 16897/ 292968 | consumed samples: 34605056 | consumed tokens: 18189893632 | elapsed time per iteration (ms): 150054.3 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.698099E+00 | loss scale: 65536.0 | grad norm: 60560.736 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.47 | iteration 16898/ 292968 | consumed samples: 34607104 | consumed tokens: 18191925248 | elapsed time per iteration (ms): 149844.1 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.704518E+00 | loss scale: 65536.0 | grad norm: 83201.578 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.60 | iteration 16899/ 292968 | consumed samples: 34609152 | consumed tokens: 18193956864 | elapsed time per iteration (ms): 149428.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.706546E+00 | loss scale: 65536.0 | grad norm: 60847.407 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.84 | iteration 16900/ 292968 | consumed samples: 34611200 | consumed tokens: 18195988480 | elapsed time per iteration (ms): 149774.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.708802E+00 | loss scale: 65536.0 | grad norm: 56067.424 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.64 | saving checkpoint at iteration 16900 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 11:51:09,663] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/mp_rank_00_model_states.pt [2022-02-06 11:51:09,837] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/mp_rank_01_model_states.pt [2022-02-06 11:51:24,340] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 11:51:25,210] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 11:51:25,294] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 11:51:25,400] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 11:51:25,969] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 11:51:26,084] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 11:51:26,394] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 11:51:26,440] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 11:51:26,945] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 11:51:27,176] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 11:51:27,322] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 11:51:27,434] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 11:51:27,483] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 11:51:27,718] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 11:51:27,834] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 11:51:27,930] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 11:51:28,071] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 11:51:28,166] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 11:51:28,175] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 11:51:28,186] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 11:51:28,212] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 11:51:28,217] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 11:51:28,570] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 11:51:28,577] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 11:51:33,269] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 11:51:33,389] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 11:51:33,418] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 11:51:33,497] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 11:51:33,811] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 11:51:34,547] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 11:51:34,586] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 11:51:34,641] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 11:51:34,970] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 11:51:35,420] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 11:51:35,466] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 11:51:36,212] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 11:51:36,951] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 11:51:36,970] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 11:51:37,003] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 11:51:37,403] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 11:51:37,415] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 11:51:37,625] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 11:51:37,889] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 11:51:37,955] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 11:51:38,031] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 11:51:38,244] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 11:51:38,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 11:51:38,452] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 11:51:38,742] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 11:51:38,807] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 11:51:38,760] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 11:51:39,046] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 11:51:39,701] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 11:51:39,758] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 11:51:39,880] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 11:51:40,297] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 11:51:40,314] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 11:51:40,436] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 11:51:40,491] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 11:51:40,659] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 11:51:40,706] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 11:51:40,818] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 11:51:40,821] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 11:51:40,823] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 11:51:41,329] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 11:51:41,785] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 11:51:41,823] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 11:51:41,878] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 11:51:41,888] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 11:51:41,943] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 11:51:42,445] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 11:51:42,474] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 11:51:43,201] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 11:51:43,254] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 11:51:43,297] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 11:51:43,514] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 11:51:43,526] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 11:51:43,577] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 11:51:43,619] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 11:51:43,705] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 11:51:43,801] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 11:51:44,058] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 11:51:44,061] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 11:51:44,171] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 11:51:44,219] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 11:51:44,287] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 11:51:44,428] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 11:51:44,448] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 11:51:44,516] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 11:51:44,556] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 11:51:44,823] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 11:51:44,977] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 11:51:44,994] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 11:51:45,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 11:51:45,318] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 11:51:45,324] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 11:51:45,372] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 11:51:45,453] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 11:51:45,487] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 11:51:45,586] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 11:51:45,597] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 11:51:45,639] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 11:51:45,729] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 11:51:45,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 11:51:46,055] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 11:51:46,191] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 11:51:46,282] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 11:51:46,489] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 11:51:46,623] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 11:51:46,658] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 11:51:46,663] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 11:51:46,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 11:51:46,720] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 11:51:46,820] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 11:51:46,849] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 11:51:47,062] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 11:51:47,084] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 11:51:47,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 11:51:47,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 11:51:47,878] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 11:51:47,999] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 11:51:48,053] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 11:51:48,112] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 11:51:48,150] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 11:51:48,259] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 11:51:48,280] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 11:51:51,698] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 11:51:51,779] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16900/zero_pp_rank_0_mp_rank_50_optim_states.pt successfully saved checkpoint at iteration 16900 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 47087.55 iteration 16901/ 292968 | consumed samples: 34613248 | consumed tokens: 18198020096 | elapsed time per iteration (ms): 196051.5 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.682006E+00 | loss scale: 65536.0 | grad norm: 58726.496 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 66.95 | iteration 16902/ 292968 | consumed samples: 34615296 | consumed tokens: 18200051712 | elapsed time per iteration (ms): 157559.0 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.706696E+00 | loss scale: 65536.0 | grad norm: 93272.879 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.31 | iteration 16903/ 292968 | consumed samples: 34617344 | consumed tokens: 18202083328 | elapsed time per iteration (ms): 148914.3 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.716279E+00 | loss scale: 65536.0 | grad norm: 44924.101 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.14 | iteration 16904/ 292968 | consumed samples: 34619392 | consumed tokens: 18204114944 | elapsed time per iteration (ms): 155121.2 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.721370E+00 | loss scale: 65536.0 | grad norm: 72441.112 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.62 | iteration 16905/ 292968 | consumed samples: 34621440 | consumed tokens: 18206146560 | elapsed time per iteration (ms): 149771.5 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.705163E+00 | loss scale: 65536.0 | grad norm: 53169.213 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.64 | iteration 16906/ 292968 | consumed samples: 34623488 | consumed tokens: 18208178176 | elapsed time per iteration (ms): 149191.2 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.708192E+00 | loss scale: 65536.0 | grad norm: 59208.324 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.98 | iteration 16907/ 292968 | consumed samples: 34625536 | consumed tokens: 18210209792 | elapsed time per iteration (ms): 149535.1 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.693417E+00 | loss scale: 65536.0 | grad norm: 60566.492 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.78 | iteration 16908/ 292968 | consumed samples: 34627584 | consumed tokens: 18212241408 | elapsed time per iteration (ms): 149045.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.716897E+00 | loss scale: 65536.0 | grad norm: 69673.113 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.06 | iteration 16909/ 292968 | consumed samples: 34629632 | consumed tokens: 18214273024 | elapsed time per iteration (ms): 149202.5 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.713536E+00 | loss scale: 65536.0 | grad norm: 69901.028 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.97 | iteration 16910/ 292968 | consumed samples: 34631680 | consumed tokens: 18216304640 | elapsed time per iteration (ms): 149069.0 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.689873E+00 | loss scale: 65536.0 | grad norm: 80043.602 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.05 | iteration 16911/ 292968 | consumed samples: 34633728 | consumed tokens: 18218336256 | elapsed time per iteration (ms): 149802.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.711225E+00 | loss scale: 65536.0 | grad norm: 63280.912 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.62 | iteration 16912/ 292968 | consumed samples: 34635776 | consumed tokens: 18220367872 | elapsed time per iteration (ms): 150280.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.689829E+00 | loss scale: 65536.0 | grad norm: 55156.469 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.34 | iteration 16913/ 292968 | consumed samples: 34637824 | consumed tokens: 18222399488 | elapsed time per iteration (ms): 150393.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.719492E+00 | loss scale: 65536.0 | grad norm: 64316.798 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.28 | iteration 16914/ 292968 | consumed samples: 34639872 | consumed tokens: 18224431104 | elapsed time per iteration (ms): 149345.6 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.710757E+00 | loss scale: 65536.0 | grad norm: 82950.852 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.89 | iteration 16915/ 292968 | consumed samples: 34641920 | consumed tokens: 18226462720 | elapsed time per iteration (ms): 149081.4 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.709783E+00 | loss scale: 65536.0 | grad norm: 62295.435 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.04 | iteration 16916/ 292968 | consumed samples: 34643968 | consumed tokens: 18228494336 | elapsed time per iteration (ms): 148863.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.700122E+00 | loss scale: 65536.0 | grad norm: 78853.741 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.17 | iteration 16917/ 292968 | consumed samples: 34646016 | consumed tokens: 18230525952 | elapsed time per iteration (ms): 148964.7 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.697964E+00 | loss scale: 65536.0 | grad norm: 59098.269 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.11 | iteration 16918/ 292968 | consumed samples: 34648064 | consumed tokens: 18232557568 | elapsed time per iteration (ms): 149067.6 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.726390E+00 | loss scale: 65536.0 | grad norm: 71372.266 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.05 | iteration 16919/ 292968 | consumed samples: 34650112 | consumed tokens: 18234589184 | elapsed time per iteration (ms): 149272.1 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.686197E+00 | loss scale: 65536.0 | grad norm: 71303.358 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.93 | iteration 16920/ 292968 | consumed samples: 34652160 | consumed tokens: 18236620800 | elapsed time per iteration (ms): 149608.7 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.692087E+00 | loss scale: 65536.0 | grad norm: 76810.432 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.73 | iteration 16921/ 292968 | consumed samples: 34654208 | consumed tokens: 18238652416 | elapsed time per iteration (ms): 148970.2 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.698886E+00 | loss scale: 65536.0 | grad norm: 69193.411 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.11 | iteration 16922/ 292968 | consumed samples: 34656256 | consumed tokens: 18240684032 | elapsed time per iteration (ms): 149345.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.685139E+00 | loss scale: 65536.0 | grad norm: 79901.774 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.89 | iteration 16923/ 292968 | consumed samples: 34658304 | consumed tokens: 18242715648 | elapsed time per iteration (ms): 149829.2 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.715407E+00 | loss scale: 65536.0 | grad norm: 71945.268 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.60 | iteration 16924/ 292968 | consumed samples: 34660352 | consumed tokens: 18244747264 | elapsed time per iteration (ms): 148722.7 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.721333E+00 | loss scale: 65536.0 | grad norm: 64866.783 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.26 | iteration 16925/ 292968 | consumed samples: 34662400 | consumed tokens: 18246778880 | elapsed time per iteration (ms): 149232.1 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.725252E+00 | loss scale: 65536.0 | grad norm: 83512.303 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.95 | iteration 16926/ 292968 | consumed samples: 34664448 | consumed tokens: 18248810496 | elapsed time per iteration (ms): 148966.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.710025E+00 | loss scale: 65536.0 | grad norm: 61176.338 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.11 | iteration 16927/ 292968 | consumed samples: 34666496 | consumed tokens: 18250842112 | elapsed time per iteration (ms): 148955.1 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.725580E+00 | loss scale: 65536.0 | grad norm: 70979.164 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.12 | iteration 16928/ 292968 | consumed samples: 34668544 | consumed tokens: 18252873728 | elapsed time per iteration (ms): 150181.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.697002E+00 | loss scale: 65536.0 | grad norm: 68977.007 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.40 | iteration 16929/ 292968 | consumed samples: 34670592 | consumed tokens: 18254905344 | elapsed time per iteration (ms): 148869.7 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.709867E+00 | loss scale: 65536.0 | grad norm: 74290.840 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.17 | iteration 16930/ 292968 | consumed samples: 34672640 | consumed tokens: 18256936960 | elapsed time per iteration (ms): 148566.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.716198E+00 | loss scale: 65536.0 | grad norm: 91853.510 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.35 | iteration 16931/ 292968 | consumed samples: 34674688 | consumed tokens: 18258968576 | elapsed time per iteration (ms): 149632.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.710444E+00 | loss scale: 65536.0 | grad norm: 49218.367 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.72 | iteration 16932/ 292968 | consumed samples: 34676736 | consumed tokens: 18261000192 | elapsed time per iteration (ms): 149531.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.700948E+00 | loss scale: 65536.0 | grad norm: 73773.827 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.78 | iteration 16933/ 292968 | consumed samples: 34678784 | consumed tokens: 18263031808 | elapsed time per iteration (ms): 150428.9 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.721651E+00 | loss scale: 65536.0 | grad norm: 47074.120 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.25 | iteration 16934/ 292968 | consumed samples: 34680832 | consumed tokens: 18265063424 | elapsed time per iteration (ms): 149278.5 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.705924E+00 | loss scale: 65536.0 | grad norm: 68420.540 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.93 | iteration 16935/ 292968 | consumed samples: 34682880 | consumed tokens: 18267095040 | elapsed time per iteration (ms): 149261.8 | learning rate: 5.935E-05 | global batch size: 2048 | lm loss: 2.715352E+00 | loss scale: 65536.0 | grad norm: 87414.544 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.94 | iteration 16936/ 292968 | consumed samples: 34684928 | consumed tokens: 18269126656 | elapsed time per iteration (ms): 148994.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.699672E+00 | loss scale: 65536.0 | grad norm: 67536.166 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.09 | iteration 16937/ 292968 | consumed samples: 34686976 | consumed tokens: 18271158272 | elapsed time per iteration (ms): 149192.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.746296E+00 | loss scale: 65536.0 | grad norm: 113462.892 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.98 | iteration 16938/ 292968 | consumed samples: 34689024 | consumed tokens: 18273189888 | elapsed time per iteration (ms): 149849.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.681776E+00 | loss scale: 65536.0 | grad norm: 55081.969 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.59 | iteration 16939/ 292968 | consumed samples: 34691072 | consumed tokens: 18275221504 | elapsed time per iteration (ms): 149325.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.731830E+00 | loss scale: 65536.0 | grad norm: 127224.326 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.90 | iteration 16940/ 292968 | consumed samples: 34693120 | consumed tokens: 18277253120 | elapsed time per iteration (ms): 149582.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.728563E+00 | loss scale: 65536.0 | grad norm: 88881.361 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.75 | iteration 16941/ 292968 | consumed samples: 34695168 | consumed tokens: 18279284736 | elapsed time per iteration (ms): 151839.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.729912E+00 | loss scale: 65536.0 | grad norm: 69962.669 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 86.44 | iteration 16942/ 292968 | consumed samples: 34697216 | consumed tokens: 18281316352 | elapsed time per iteration (ms): 148858.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.717329E+00 | loss scale: 65536.0 | grad norm: 91850.843 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.18 | iteration 16943/ 292968 | consumed samples: 34699264 | consumed tokens: 18283347968 | elapsed time per iteration (ms): 149999.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.716061E+00 | loss scale: 65536.0 | grad norm: 42957.981 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.50 | iteration 16944/ 292968 | consumed samples: 34701312 | consumed tokens: 18285379584 | elapsed time per iteration (ms): 149348.5 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.737587E+00 | loss scale: 65536.0 | grad norm: 98287.143 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.89 | iteration 16945/ 292968 | consumed samples: 34703360 | consumed tokens: 18287411200 | elapsed time per iteration (ms): 148942.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.712087E+00 | loss scale: 65536.0 | grad norm: 58154.065 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.13 | iteration 16946/ 292968 | consumed samples: 34705408 | consumed tokens: 18289442816 | elapsed time per iteration (ms): 150240.8 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.729336E+00 | loss scale: 65536.0 | grad norm: 58896.184 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.36 | iteration 16947/ 292968 | consumed samples: 34707456 | consumed tokens: 18291474432 | elapsed time per iteration (ms): 149744.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.695366E+00 | loss scale: 65536.0 | grad norm: 71766.716 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.65 | iteration 16948/ 292968 | consumed samples: 34709504 | consumed tokens: 18293506048 | elapsed time per iteration (ms): 149205.8 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.716392E+00 | loss scale: 65536.0 | grad norm: 67562.327 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.97 | iteration 16949/ 292968 | consumed samples: 34711552 | consumed tokens: 18295537664 | elapsed time per iteration (ms): 149330.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.702588E+00 | loss scale: 65536.0 | grad norm: 70186.929 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.90 | iteration 16950/ 292968 | consumed samples: 34713600 | consumed tokens: 18297569280 | elapsed time per iteration (ms): 149761.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.703901E+00 | loss scale: 65536.0 | grad norm: 70617.511 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.64 | ------------------------------------------------------------------------------------------- valid loss at iteration 16950 | lm loss value: 3.122339E+00 | lm loss PPL: 2.269942E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 16950 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 14:04:19,222] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/mp_rank_01_model_states.pt [2022-02-06 14:04:19,398] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/mp_rank_00_model_states.pt [2022-02-06 14:04:34,191] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 14:04:35,071] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 14:04:35,150] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 14:04:35,947] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 14:04:36,309] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 14:04:37,504] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 14:04:37,732] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 14:04:37,735] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 14:04:37,794] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 14:04:37,846] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 14:04:37,813] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 14:04:37,946] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 14:04:37,967] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 14:04:38,242] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 14:04:38,259] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 14:04:38,293] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 14:04:38,642] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 14:04:38,656] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 14:04:39,763] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 14:04:40,052] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 14:04:40,151] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 14:04:40,208] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 14:04:40,209] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 14:04:40,226] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 14:04:40,259] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 14:04:41,425] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 14:04:41,504] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 14:04:41,968] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 14:04:42,031] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 14:04:42,182] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 14:04:42,171] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 14:04:42,284] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 14:04:42,369] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 14:04:44,687] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 14:04:44,752] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 14:04:44,957] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 14:04:44,999] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 14:04:45,227] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 14:04:45,370] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 14:04:45,472] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 14:04:45,536] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 14:04:45,541] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 14:04:45,750] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 14:04:46,276] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 14:04:46,417] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 14:04:46,602] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 14:04:46,636] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 14:04:46,926] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 14:04:47,164] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 14:04:47,327] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 14:04:47,419] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 14:04:47,441] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 14:04:47,627] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 14:04:47,771] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 14:04:47,834] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 14:04:47,988] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 14:04:48,012] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 14:04:48,110] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 14:04:48,417] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 14:04:49,021] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 14:04:49,196] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 14:04:49,252] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 14:04:49,310] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 14:04:49,516] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 14:04:49,664] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 14:04:49,605] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 14:04:49,855] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 14:04:49,914] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 14:04:50,034] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 14:04:50,037] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 14:04:50,200] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 14:04:50,273] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 14:04:50,338] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 14:04:50,505] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 14:04:50,529] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 14:04:50,670] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 14:04:50,986] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 14:04:51,012] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 14:04:51,024] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 14:04:52,010] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 14:04:52,088] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 14:04:52,143] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 14:04:52,208] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 14:04:52,240] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 14:04:52,282] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 14:04:52,297] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 14:04:52,316] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 14:04:52,780] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 14:04:52,825] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 14:04:52,837] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 14:04:52,914] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 14:04:53,060] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 14:04:53,446] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 14:04:53,554] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 14:04:53,841] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 14:04:53,860] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 14:04:53,869] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 14:04:54,063] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 14:04:54,118] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 14:04:54,129] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 14:04:54,533] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 14:04:54,597] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 14:04:54,751] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 14:04:54,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 14:04:55,617] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 14:04:55,699] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 14:04:55,893] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 14:04:56,069] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 14:04:56,211] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 14:04:56,355] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 14:04:56,433] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 14:04:56,558] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 14:04:56,692] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 14:04:57,353] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 14:04:57,724] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 14:04:57,794] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 14:04:57,739] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 14:04:57,827] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 14:04:58,842] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 14:04:59,660] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 14:05:00,102] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 14:05:00,167] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 14:05:01,006] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 14:05:01,025] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 14:05:01,278] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 14:05:01,324] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 14:05:04,603] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 14:05:04,658] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step16950/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 16950 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 51060.26 iteration 16951/ 292968 | consumed samples: 34715648 | consumed tokens: 18299600896 | elapsed time per iteration (ms): 660150.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.692519E+00 | loss scale: 65536.0 | grad norm: 74533.773 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 19.88 | iteration 16952/ 292968 | consumed samples: 34717696 | consumed tokens: 18301632512 | elapsed time per iteration (ms): 151977.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.719808E+00 | loss scale: 65536.0 | grad norm: 49990.479 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 86.37 | iteration 16953/ 292968 | consumed samples: 34719744 | consumed tokens: 18303664128 | elapsed time per iteration (ms): 152641.8 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.682543E+00 | loss scale: 65536.0 | grad norm: 69883.094 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 85.99 | iteration 16954/ 292968 | consumed samples: 34721792 | consumed tokens: 18305695744 | elapsed time per iteration (ms): 150590.8 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.700202E+00 | loss scale: 65536.0 | grad norm: 86669.374 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.16 | iteration 16955/ 292968 | consumed samples: 34723840 | consumed tokens: 18307727360 | elapsed time per iteration (ms): 149733.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.707797E+00 | loss scale: 65536.0 | grad norm: 53086.287 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.66 | iteration 16956/ 292968 | consumed samples: 34725888 | consumed tokens: 18309758976 | elapsed time per iteration (ms): 149580.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.711225E+00 | loss scale: 65536.0 | grad norm: 49945.459 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.75 | iteration 16957/ 292968 | consumed samples: 34727936 | consumed tokens: 18311790592 | elapsed time per iteration (ms): 149359.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.739030E+00 | loss scale: 65536.0 | grad norm: 64827.994 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.88 | iteration 16958/ 292968 | consumed samples: 34729984 | consumed tokens: 18313822208 | elapsed time per iteration (ms): 150420.3 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.693839E+00 | loss scale: 65536.0 | grad norm: 86103.807 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.26 | iteration 16959/ 292968 | consumed samples: 34732032 | consumed tokens: 18315853824 | elapsed time per iteration (ms): 149956.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.684352E+00 | loss scale: 65536.0 | grad norm: 63760.496 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.53 | iteration 16960/ 292968 | consumed samples: 34734080 | consumed tokens: 18317885440 | elapsed time per iteration (ms): 149229.5 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.722105E+00 | loss scale: 65536.0 | grad norm: 84821.916 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.96 | iteration 16961/ 292968 | consumed samples: 34736128 | consumed tokens: 18319917056 | elapsed time per iteration (ms): 148999.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.718212E+00 | loss scale: 65536.0 | grad norm: 62257.305 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 88.09 | iteration 16962/ 292968 | consumed samples: 34738176 | consumed tokens: 18321948672 | elapsed time per iteration (ms): 150244.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.746654E+00 | loss scale: 65536.0 | grad norm: 79087.862 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.36 | iteration 16963/ 292968 | consumed samples: 34740224 | consumed tokens: 18323980288 | elapsed time per iteration (ms): 149501.4 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.702177E+00 | loss scale: 65536.0 | grad norm: 73577.959 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.80 | iteration 16964/ 292968 | consumed samples: 34742272 | consumed tokens: 18326011904 | elapsed time per iteration (ms): 149554.0 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.725727E+00 | loss scale: 65536.0 | grad norm: 85894.191 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.77 | iteration 16965/ 292968 | consumed samples: 34744320 | consumed tokens: 18328043520 | elapsed time per iteration (ms): 149957.3 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.718404E+00 | loss scale: 65536.0 | grad norm: 68411.978 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.53 | iteration 16966/ 292968 | consumed samples: 34746368 | consumed tokens: 18330075136 | elapsed time per iteration (ms): 150772.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.717603E+00 | loss scale: 65536.0 | grad norm: 62226.745 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.06 | iteration 16967/ 292968 | consumed samples: 34748416 | consumed tokens: 18332106752 | elapsed time per iteration (ms): 149905.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.739357E+00 | loss scale: 65536.0 | grad norm: 58114.399 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.56 | iteration 16968/ 292968 | consumed samples: 34750464 | consumed tokens: 18334138368 | elapsed time per iteration (ms): 150686.0 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.711397E+00 | loss scale: 65536.0 | grad norm: 76004.861 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.11 | iteration 16969/ 292968 | consumed samples: 34752512 | consumed tokens: 18336169984 | elapsed time per iteration (ms): 150048.5 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.719254E+00 | loss scale: 65536.0 | grad norm: 56669.332 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.48 | iteration 16970/ 292968 | consumed samples: 34754560 | consumed tokens: 18338201600 | elapsed time per iteration (ms): 150067.3 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.755551E+00 | loss scale: 65536.0 | grad norm: 60547.620 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.47 | iteration 16971/ 292968 | consumed samples: 34756608 | consumed tokens: 18340233216 | elapsed time per iteration (ms): 160623.4 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.738809E+00 | loss scale: 65536.0 | grad norm: 60045.246 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.72 | iteration 16972/ 292968 | consumed samples: 34758656 | consumed tokens: 18342264832 | elapsed time per iteration (ms): 150487.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.760576E+00 | loss scale: 65536.0 | grad norm: 61036.317 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.22 | iteration 16973/ 292968 | consumed samples: 34760704 | consumed tokens: 18344296448 | elapsed time per iteration (ms): 154410.0 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.755463E+00 | loss scale: 65536.0 | grad norm: 67693.359 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 85.01 | iteration 16974/ 292968 | consumed samples: 34762752 | consumed tokens: 18346328064 | elapsed time per iteration (ms): 153111.2 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.741009E+00 | loss scale: 65536.0 | grad norm: 79568.423 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 85.73 | iteration 16975/ 292968 | consumed samples: 34764800 | consumed tokens: 18348359680 | elapsed time per iteration (ms): 151593.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.738951E+00 | loss scale: 65536.0 | grad norm: 70090.761 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 86.58 | iteration 16976/ 292968 | consumed samples: 34766848 | consumed tokens: 18350391296 | elapsed time per iteration (ms): 150650.4 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.737767E+00 | loss scale: 65536.0 | grad norm: 66565.393 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.13 | iteration 16977/ 292968 | consumed samples: 34768896 | consumed tokens: 18352422912 | elapsed time per iteration (ms): 150147.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.721330E+00 | loss scale: 65536.0 | grad norm: 57944.597 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.42 | iteration 16978/ 292968 | consumed samples: 34770944 | consumed tokens: 18354454528 | elapsed time per iteration (ms): 149941.4 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.736767E+00 | loss scale: 65536.0 | grad norm: 63297.696 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.54 | iteration 16979/ 292968 | consumed samples: 34772992 | consumed tokens: 18356486144 | elapsed time per iteration (ms): 150229.3 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.726599E+00 | loss scale: 65536.0 | grad norm: 76225.459 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.37 | iteration 16980/ 292968 | consumed samples: 34775040 | consumed tokens: 18358517760 | elapsed time per iteration (ms): 150018.3 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.744030E+00 | loss scale: 65536.0 | grad norm: 59897.430 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.49 | iteration 16981/ 292968 | consumed samples: 34777088 | consumed tokens: 18360549376 | elapsed time per iteration (ms): 149677.8 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.705761E+00 | loss scale: 65536.0 | grad norm: 57126.102 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.69 | iteration 16982/ 292968 | consumed samples: 34779136 | consumed tokens: 18362580992 | elapsed time per iteration (ms): 149672.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.711971E+00 | loss scale: 65536.0 | grad norm: 80691.416 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.70 | iteration 16983/ 292968 | consumed samples: 34781184 | consumed tokens: 18364612608 | elapsed time per iteration (ms): 149643.0 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.711076E+00 | loss scale: 65536.0 | grad norm: 66353.483 | num zeros: 0.0 | curriculum seqlen: 992 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.014 | TFLOPs: 87.71 | iteration 16984/ 292968 | consumed samples: 34783232 | consumed tokens: 18366660608 | elapsed time per iteration (ms): 159850.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.721797E+00 | loss scale: 65536.0 | grad norm: 63019.605 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.77 | iteration 16985/ 292968 | consumed samples: 34785280 | consumed tokens: 18368708608 | elapsed time per iteration (ms): 158865.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.718378E+00 | loss scale: 65536.0 | grad norm: 65471.524 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.29 | iteration 16986/ 292968 | consumed samples: 34787328 | consumed tokens: 18370756608 | elapsed time per iteration (ms): 158878.9 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.744819E+00 | loss scale: 65536.0 | grad norm: 95211.560 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.28 | iteration 16987/ 292968 | consumed samples: 34789376 | consumed tokens: 18372804608 | elapsed time per iteration (ms): 159060.5 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.703067E+00 | loss scale: 65536.0 | grad norm: 62914.348 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.19 | iteration 16988/ 292968 | consumed samples: 34791424 | consumed tokens: 18374852608 | elapsed time per iteration (ms): 159086.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.717930E+00 | loss scale: 65536.0 | grad norm: 81564.208 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.17 | iteration 16989/ 292968 | consumed samples: 34793472 | consumed tokens: 18376900608 | elapsed time per iteration (ms): 159904.8 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.744952E+00 | loss scale: 65536.0 | grad norm: 50071.016 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.75 | iteration 16990/ 292968 | consumed samples: 34795520 | consumed tokens: 18378948608 | elapsed time per iteration (ms): 159212.0 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.717678E+00 | loss scale: 65536.0 | grad norm: 67811.233 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.11 | iteration 16991/ 292968 | consumed samples: 34797568 | consumed tokens: 18380996608 | elapsed time per iteration (ms): 159432.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.725503E+00 | loss scale: 65536.0 | grad norm: 87712.430 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.99 | iteration 16992/ 292968 | consumed samples: 34799616 | consumed tokens: 18383044608 | elapsed time per iteration (ms): 159036.6 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.712101E+00 | loss scale: 65536.0 | grad norm: 46242.515 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.20 | iteration 16993/ 292968 | consumed samples: 34801664 | consumed tokens: 18385092608 | elapsed time per iteration (ms): 162602.5 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.717126E+00 | loss scale: 65536.0 | grad norm: 68460.426 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.37 | iteration 16994/ 292968 | consumed samples: 34803712 | consumed tokens: 18387140608 | elapsed time per iteration (ms): 158937.3 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.698225E+00 | loss scale: 65536.0 | grad norm: 68080.933 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.25 | iteration 16995/ 292968 | consumed samples: 34805760 | consumed tokens: 18389188608 | elapsed time per iteration (ms): 159134.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.739546E+00 | loss scale: 65536.0 | grad norm: 53876.249 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.15 | iteration 16996/ 292968 | consumed samples: 34807808 | consumed tokens: 18391236608 | elapsed time per iteration (ms): 159011.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.740600E+00 | loss scale: 65536.0 | grad norm: 101677.195 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.21 | iteration 16997/ 292968 | consumed samples: 34809856 | consumed tokens: 18393284608 | elapsed time per iteration (ms): 159010.2 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.742258E+00 | loss scale: 65536.0 | grad norm: 47230.745 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.21 | iteration 16998/ 292968 | consumed samples: 34811904 | consumed tokens: 18395332608 | elapsed time per iteration (ms): 158849.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.767190E+00 | loss scale: 65536.0 | grad norm: 91832.772 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.30 | iteration 16999/ 292968 | consumed samples: 34813952 | consumed tokens: 18397380608 | elapsed time per iteration (ms): 158913.7 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.755822E+00 | loss scale: 65536.0 | grad norm: 70962.749 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.26 | iteration 17000/ 292968 | consumed samples: 34816000 | consumed tokens: 18399428608 | elapsed time per iteration (ms): 159542.5 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.732614E+00 | loss scale: 65536.0 | grad norm: 65842.077 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.93 | saving checkpoint at iteration 17000 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 16:13:15,130] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/mp_rank_00_model_states.pt [2022-02-06 16:13:15,654] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/mp_rank_01_model_states.pt [2022-02-06 16:13:30,658] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 16:13:31,262] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 16:13:31,539] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 16:13:32,457] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 16:13:32,474] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 16:13:32,567] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 16:13:33,155] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 16:13:33,167] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 16:13:33,452] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 16:13:33,491] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 16:13:33,780] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 16:13:33,898] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 16:13:34,050] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 16:13:34,219] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 16:13:34,465] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 16:13:34,502] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 16:13:35,745] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 16:13:36,104] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 16:13:37,297] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 16:13:37,635] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 16:13:37,661] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 16:13:37,805] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 16:13:37,833] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 16:13:38,053] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 16:13:39,019] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 16:13:39,095] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 16:13:39,228] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 16:13:39,540] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 16:13:39,612] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 16:13:39,651] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 16:13:39,821] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 16:13:40,060] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 16:13:40,144] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 16:13:40,313] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 16:13:40,592] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 16:13:40,650] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 16:13:41,896] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 16:13:42,085] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 16:13:42,151] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 16:13:42,746] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 16:13:42,932] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 16:13:42,949] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 16:13:43,076] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 16:13:43,054] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 16:13:43,076] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 16:13:43,403] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 16:13:43,440] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 16:13:43,755] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 16:13:43,755] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 16:13:43,775] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 16:13:43,972] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 16:13:44,168] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 16:13:44,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 16:13:44,266] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 16:13:44,520] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 16:13:44,588] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 16:13:44,611] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 16:13:44,618] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 16:13:44,804] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 16:13:44,947] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 16:13:45,028] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 16:13:45,126] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 16:13:45,240] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 16:13:45,327] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 16:13:45,375] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 16:13:45,857] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 16:13:46,830] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 16:13:48,170] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 16:13:48,235] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 16:13:48,308] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 16:13:48,375] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 16:13:48,565] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 16:13:48,592] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 16:13:48,613] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 16:13:48,671] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 16:13:48,737] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 16:13:49,068] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 16:13:49,175] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 16:13:49,176] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 16:13:49,490] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 16:13:49,628] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 16:13:49,706] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 16:13:49,864] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 16:13:50,107] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 16:13:50,944] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 16:13:50,952] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 16:13:51,007] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 16:13:51,089] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 16:13:51,096] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 16:13:51,827] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 16:13:51,927] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 16:13:51,950] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 16:13:52,094] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 16:13:52,259] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 16:13:52,314] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 16:13:52,452] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 16:13:52,532] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 16:13:52,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 16:13:52,704] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 16:13:52,975] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 16:13:53,034] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 16:13:53,092] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 16:13:53,424] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 16:13:53,554] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 16:13:53,594] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 16:13:53,675] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 16:13:53,952] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 16:13:54,169] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 16:13:55,146] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 16:13:55,194] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 16:13:55,431] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 16:13:55,433] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 16:13:55,575] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 16:13:55,664] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 16:13:56,383] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 16:13:56,492] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 16:13:56,907] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 16:13:56,938] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 16:13:57,901] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 16:13:57,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 16:13:59,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 16:13:59,235] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 16:14:01,452] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 16:14:01,458] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 16:14:02,211] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 16:14:02,384] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 16:14:03,831] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 16:14:03,984] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17000/zero_pp_rank_0_mp_rank_125_optim_states.pt successfully saved checkpoint at iteration 17000 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 54243.32 iteration 17001/ 292968 | consumed samples: 34818048 | consumed tokens: 18401476608 | elapsed time per iteration (ms): 213182.2 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.737826E+00 | loss scale: 65536.0 | grad norm: 76557.880 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 62.07 | iteration 17002/ 292968 | consumed samples: 34820096 | consumed tokens: 18403524608 | elapsed time per iteration (ms): 159189.4 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.749738E+00 | loss scale: 65536.0 | grad norm: 60329.296 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.12 | iteration 17003/ 292968 | consumed samples: 34822144 | consumed tokens: 18405572608 | elapsed time per iteration (ms): 158650.4 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.741941E+00 | loss scale: 65536.0 | grad norm: 68018.848 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.40 | iteration 17004/ 292968 | consumed samples: 34824192 | consumed tokens: 18407620608 | elapsed time per iteration (ms): 159129.1 | learning rate: 5.934E-05 | global batch size: 2048 | lm loss: 2.744120E+00 | loss scale: 65536.0 | grad norm: 83280.722 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.15 | iteration 17005/ 292968 | consumed samples: 34826240 | consumed tokens: 18409668608 | elapsed time per iteration (ms): 164143.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.732377E+00 | loss scale: 65536.0 | grad norm: 65448.331 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.61 | iteration 17006/ 292968 | consumed samples: 34828288 | consumed tokens: 18411716608 | elapsed time per iteration (ms): 159749.4 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.753265E+00 | loss scale: 65536.0 | grad norm: 83800.636 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.83 | iteration 17007/ 292968 | consumed samples: 34830336 | consumed tokens: 18413764608 | elapsed time per iteration (ms): 159155.0 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.733603E+00 | loss scale: 65536.0 | grad norm: 45313.328 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.14 | iteration 17008/ 292968 | consumed samples: 34832384 | consumed tokens: 18415812608 | elapsed time per iteration (ms): 159197.9 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.717291E+00 | loss scale: 65536.0 | grad norm: 63953.522 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.11 | iteration 17009/ 292968 | consumed samples: 34834432 | consumed tokens: 18417860608 | elapsed time per iteration (ms): 159131.2 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.719382E+00 | loss scale: 65536.0 | grad norm: 86381.203 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.15 | iteration 17010/ 292968 | consumed samples: 34836480 | consumed tokens: 18419908608 | elapsed time per iteration (ms): 158862.3 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.728508E+00 | loss scale: 65536.0 | grad norm: 68922.430 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.29 | iteration 17011/ 292968 | consumed samples: 34838528 | consumed tokens: 18421956608 | elapsed time per iteration (ms): 158840.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.727998E+00 | loss scale: 65536.0 | grad norm: 74561.675 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.30 | iteration 17012/ 292968 | consumed samples: 34840576 | consumed tokens: 18424004608 | elapsed time per iteration (ms): 159238.0 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.732754E+00 | loss scale: 65536.0 | grad norm: 62678.295 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.09 | iteration 17013/ 292968 | consumed samples: 34842624 | consumed tokens: 18426052608 | elapsed time per iteration (ms): 158623.9 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.753321E+00 | loss scale: 65536.0 | grad norm: 76767.586 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.41 | iteration 17014/ 292968 | consumed samples: 34844672 | consumed tokens: 18428100608 | elapsed time per iteration (ms): 158828.4 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.733307E+00 | loss scale: 65536.0 | grad norm: 66774.412 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.31 | iteration 17015/ 292968 | consumed samples: 34846720 | consumed tokens: 18430148608 | elapsed time per iteration (ms): 159998.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.757985E+00 | loss scale: 65536.0 | grad norm: 93252.697 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.70 | iteration 17016/ 292968 | consumed samples: 34848768 | consumed tokens: 18432196608 | elapsed time per iteration (ms): 159196.6 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.764874E+00 | loss scale: 65536.0 | grad norm: 61449.550 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.11 | iteration 17017/ 292968 | consumed samples: 34850816 | consumed tokens: 18434244608 | elapsed time per iteration (ms): 159585.4 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.750633E+00 | loss scale: 65536.0 | grad norm: 94652.390 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.91 | iteration 17018/ 292968 | consumed samples: 34852864 | consumed tokens: 18436292608 | elapsed time per iteration (ms): 159164.4 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.749644E+00 | loss scale: 65536.0 | grad norm: 45332.420 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.13 | iteration 17019/ 292968 | consumed samples: 34854912 | consumed tokens: 18438340608 | elapsed time per iteration (ms): 158753.4 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.746034E+00 | loss scale: 65536.0 | grad norm: 94317.805 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.35 | iteration 17020/ 292968 | consumed samples: 34856960 | consumed tokens: 18440388608 | elapsed time per iteration (ms): 158954.3 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.751243E+00 | loss scale: 65536.0 | grad norm: 56166.703 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.24 | iteration 17021/ 292968 | consumed samples: 34859008 | consumed tokens: 18442436608 | elapsed time per iteration (ms): 159039.2 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.723892E+00 | loss scale: 65536.0 | grad norm: 76948.024 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.20 | iteration 17022/ 292968 | consumed samples: 34861056 | consumed tokens: 18444484608 | elapsed time per iteration (ms): 158773.1 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.762304E+00 | loss scale: 65536.0 | grad norm: 94848.578 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.34 | iteration 17023/ 292968 | consumed samples: 34863104 | consumed tokens: 18446532608 | elapsed time per iteration (ms): 158677.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.763886E+00 | loss scale: 65536.0 | grad norm: 50329.351 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.39 | iteration 17024/ 292968 | consumed samples: 34865152 | consumed tokens: 18448580608 | elapsed time per iteration (ms): 158822.1 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.732269E+00 | loss scale: 65536.0 | grad norm: 81659.841 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.31 | iteration 17025/ 292968 | consumed samples: 34867200 | consumed tokens: 18450628608 | elapsed time per iteration (ms): 158291.4 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.750888E+00 | loss scale: 65536.0 | grad norm: 74114.917 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.59 | iteration 17026/ 292968 | consumed samples: 34869248 | consumed tokens: 18452676608 | elapsed time per iteration (ms): 159277.2 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.726934E+00 | loss scale: 65536.0 | grad norm: 49898.690 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.07 | iteration 17027/ 292968 | consumed samples: 34871296 | consumed tokens: 18454724608 | elapsed time per iteration (ms): 160662.6 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.756490E+00 | loss scale: 65536.0 | grad norm: 73249.631 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.36 | iteration 17028/ 292968 | consumed samples: 34873344 | consumed tokens: 18456772608 | elapsed time per iteration (ms): 162145.3 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.739433E+00 | loss scale: 65536.0 | grad norm: 69920.358 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.60 | iteration 17029/ 292968 | consumed samples: 34875392 | consumed tokens: 18458820608 | elapsed time per iteration (ms): 158684.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.747168E+00 | loss scale: 65536.0 | grad norm: 68081.341 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.38 | iteration 17030/ 292968 | consumed samples: 34877440 | consumed tokens: 18460868608 | elapsed time per iteration (ms): 159324.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.724999E+00 | loss scale: 65536.0 | grad norm: 75881.336 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.05 | iteration 17031/ 292968 | consumed samples: 34879488 | consumed tokens: 18462916608 | elapsed time per iteration (ms): 159347.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.765899E+00 | loss scale: 65536.0 | grad norm: 79871.341 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.04 | iteration 17032/ 292968 | consumed samples: 34881536 | consumed tokens: 18464964608 | elapsed time per iteration (ms): 162121.2 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.741184E+00 | loss scale: 65536.0 | grad norm: 63324.910 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.61 | iteration 17033/ 292968 | consumed samples: 34883584 | consumed tokens: 18467012608 | elapsed time per iteration (ms): 158942.8 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.733585E+00 | loss scale: 65536.0 | grad norm: 79802.729 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.25 | iteration 17034/ 292968 | consumed samples: 34885632 | consumed tokens: 18469060608 | elapsed time per iteration (ms): 159187.6 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.727436E+00 | loss scale: 65536.0 | grad norm: 46208.770 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.12 | iteration 17035/ 292968 | consumed samples: 34887680 | consumed tokens: 18471108608 | elapsed time per iteration (ms): 158981.2 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.735993E+00 | loss scale: 65536.0 | grad norm: 60478.354 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.23 | iteration 17036/ 292968 | consumed samples: 34889728 | consumed tokens: 18473156608 | elapsed time per iteration (ms): 159428.6 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.747424E+00 | loss scale: 65536.0 | grad norm: 83650.997 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.99 | iteration 17037/ 292968 | consumed samples: 34891776 | consumed tokens: 18475204608 | elapsed time per iteration (ms): 159966.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.758623E+00 | loss scale: 65536.0 | grad norm: 36613.811 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.71 | iteration 17038/ 292968 | consumed samples: 34893824 | consumed tokens: 18477252608 | elapsed time per iteration (ms): 158538.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.762983E+00 | loss scale: 65536.0 | grad norm: 74196.228 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.46 | iteration 17039/ 292968 | consumed samples: 34895872 | consumed tokens: 18479300608 | elapsed time per iteration (ms): 158668.1 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.750100E+00 | loss scale: 65536.0 | grad norm: 64540.345 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.39 | iteration 17040/ 292968 | consumed samples: 34897920 | consumed tokens: 18481348608 | elapsed time per iteration (ms): 158848.6 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.746601E+00 | loss scale: 65536.0 | grad norm: 86445.834 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.30 | iteration 17041/ 292968 | consumed samples: 34899968 | consumed tokens: 18483396608 | elapsed time per iteration (ms): 158512.9 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.748436E+00 | loss scale: 65536.0 | grad norm: 74423.802 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.47 | iteration 17042/ 292968 | consumed samples: 34902016 | consumed tokens: 18485444608 | elapsed time per iteration (ms): 158823.9 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.756881E+00 | loss scale: 65536.0 | grad norm: 70849.925 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.31 | iteration 17043/ 292968 | consumed samples: 34904064 | consumed tokens: 18487492608 | elapsed time per iteration (ms): 158467.1 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.762121E+00 | loss scale: 65536.0 | grad norm: 72271.685 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.50 | iteration 17044/ 292968 | consumed samples: 34906112 | consumed tokens: 18489540608 | elapsed time per iteration (ms): 158959.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.743739E+00 | loss scale: 65536.0 | grad norm: 78712.987 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.24 | iteration 17045/ 292968 | consumed samples: 34908160 | consumed tokens: 18491588608 | elapsed time per iteration (ms): 159190.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.730422E+00 | loss scale: 65536.0 | grad norm: 82250.610 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.12 | iteration 17046/ 292968 | consumed samples: 34910208 | consumed tokens: 18493636608 | elapsed time per iteration (ms): 159182.9 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.757403E+00 | loss scale: 65536.0 | grad norm: 71928.594 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.12 | iteration 17047/ 292968 | consumed samples: 34912256 | consumed tokens: 18495684608 | elapsed time per iteration (ms): 158882.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.725855E+00 | loss scale: 65536.0 | grad norm: 70232.367 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.28 | iteration 17048/ 292968 | consumed samples: 34914304 | consumed tokens: 18497732608 | elapsed time per iteration (ms): 160145.4 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.749976E+00 | loss scale: 65536.0 | grad norm: 76549.455 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.62 | iteration 17049/ 292968 | consumed samples: 34916352 | consumed tokens: 18499780608 | elapsed time per iteration (ms): 158638.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.725958E+00 | loss scale: 65536.0 | grad norm: 58059.361 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.41 | iteration 17050/ 292968 | consumed samples: 34918400 | consumed tokens: 18501828608 | elapsed time per iteration (ms): 158934.6 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.744106E+00 | loss scale: 65536.0 | grad norm: 80888.584 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.25 | saving checkpoint at iteration 17050 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 18:27:16,080] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/mp_rank_00_model_states.pt [2022-02-06 18:27:16,259] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/mp_rank_01_model_states.pt [2022-02-06 18:27:47,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 18:27:50,507] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 18:27:50,584] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 18:27:50,674] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 18:27:50,776] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 18:27:50,930] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 18:27:51,017] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 18:27:51,314] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 18:27:51,512] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 18:27:51,884] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 18:27:51,907] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 18:27:52,015] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 18:27:52,120] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 18:27:52,202] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 18:27:52,235] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 18:27:52,291] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 18:27:56,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 18:27:58,823] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 18:27:58,833] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 18:27:59,454] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 18:28:01,074] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 18:28:01,102] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 18:28:01,308] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 18:28:01,358] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 18:28:01,468] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 18:28:01,773] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 18:28:02,182] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 18:28:02,257] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 18:28:02,285] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 18:28:02,841] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 18:28:03,172] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 18:28:03,269] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 18:28:03,384] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 18:28:03,550] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 18:28:03,713] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 18:28:03,713] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 18:28:04,410] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 18:28:04,983] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 18:28:05,555] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 18:28:05,779] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 18:28:05,875] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 18:28:05,888] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 18:28:05,982] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 18:28:06,058] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 18:28:06,198] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 18:28:06,320] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 18:28:06,343] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 18:28:06,626] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 18:28:06,696] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 18:28:06,716] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 18:28:06,789] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 18:28:07,044] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 18:28:07,659] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 18:28:07,679] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 18:28:07,927] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 18:28:08,064] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 18:28:08,086] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 18:28:08,093] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 18:28:08,453] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 18:28:08,534] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 18:28:08,530] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 18:28:08,542] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 18:28:08,641] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 18:28:08,662] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 18:28:09,024] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 18:28:09,109] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 18:28:09,170] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 18:28:09,517] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 18:28:09,948] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 18:28:10,306] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 18:28:10,330] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 18:28:10,340] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 18:28:10,431] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 18:28:10,611] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 18:28:10,548] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 18:28:10,781] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 18:28:10,871] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 18:28:10,879] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 18:28:10,921] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 18:28:10,948] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 18:28:10,977] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 18:28:10,995] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 18:28:11,028] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 18:28:11,231] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 18:28:11,324] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 18:28:11,178] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 18:28:11,509] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 18:28:11,811] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 18:28:12,185] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 18:28:12,466] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 18:28:12,504] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 18:28:12,596] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 18:28:12,599] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 18:28:12,606] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 18:28:12,663] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 18:28:12,990] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 18:28:13,993] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 18:28:14,023] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 18:28:14,174] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 18:28:14,176] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 18:28:15,635] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 18:28:15,688] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 18:28:16,458] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 18:28:16,522] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 18:28:16,564] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 18:28:17,801] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 18:28:17,960] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 18:28:18,120] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 18:28:19,923] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 18:28:20,009] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 18:28:20,220] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 18:28:21,153] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 18:28:22,660] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 18:28:22,771] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 18:28:24,709] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 18:28:25,499] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 18:28:25,502] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 18:28:25,583] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 18:28:25,662] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 18:28:27,049] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 18:28:27,098] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 18:28:47,968] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 18:29:02,850] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 18:29:10,739] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 18:29:11,016] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 18:29:11,045] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 18:29:12,906] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 18:29:12,976] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17050/zero_pp_rank_0_mp_rank_03_optim_states.pt successfully saved checkpoint at iteration 17050 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 144194.90 iteration 17051/ 292968 | consumed samples: 34920448 | consumed tokens: 18503876608 | elapsed time per iteration (ms): 302274.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.740866E+00 | loss scale: 65536.0 | grad norm: 67553.266 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.007 | TFLOPs: 43.77 | iteration 17052/ 292968 | consumed samples: 34922496 | consumed tokens: 18505924608 | elapsed time per iteration (ms): 158445.8 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.756342E+00 | loss scale: 65536.0 | grad norm: 97141.624 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.51 | iteration 17053/ 292968 | consumed samples: 34924544 | consumed tokens: 18507972608 | elapsed time per iteration (ms): 159441.1 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.740760E+00 | loss scale: 65536.0 | grad norm: 69294.814 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.99 | iteration 17054/ 292968 | consumed samples: 34926592 | consumed tokens: 18510020608 | elapsed time per iteration (ms): 158578.0 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.766817E+00 | loss scale: 65536.0 | grad norm: 69706.173 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.44 | iteration 17055/ 292968 | consumed samples: 34928640 | consumed tokens: 18512068608 | elapsed time per iteration (ms): 159143.0 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.770257E+00 | loss scale: 65536.0 | grad norm: 82333.699 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.14 | iteration 17056/ 292968 | consumed samples: 34930688 | consumed tokens: 18514116608 | elapsed time per iteration (ms): 159339.1 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.762411E+00 | loss scale: 65536.0 | grad norm: 69946.911 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.04 | iteration 17057/ 292968 | consumed samples: 34932736 | consumed tokens: 18516164608 | elapsed time per iteration (ms): 159042.2 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.756411E+00 | loss scale: 65536.0 | grad norm: 74057.409 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.19 | iteration 17058/ 292968 | consumed samples: 34934784 | consumed tokens: 18518212608 | elapsed time per iteration (ms): 160197.3 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.761990E+00 | loss scale: 65536.0 | grad norm: 56835.928 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.59 | iteration 17059/ 292968 | consumed samples: 34936832 | consumed tokens: 18520260608 | elapsed time per iteration (ms): 158860.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.740986E+00 | loss scale: 65536.0 | grad norm: 58353.268 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.29 | iteration 17060/ 292968 | consumed samples: 34938880 | consumed tokens: 18522308608 | elapsed time per iteration (ms): 159072.7 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.721293E+00 | loss scale: 65536.0 | grad norm: 77136.001 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.18 | iteration 17061/ 292968 | consumed samples: 34940928 | consumed tokens: 18524356608 | elapsed time per iteration (ms): 159053.3 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.754211E+00 | loss scale: 65536.0 | grad norm: 67021.381 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.19 | iteration 17062/ 292968 | consumed samples: 34942976 | consumed tokens: 18526404608 | elapsed time per iteration (ms): 158896.0 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.734320E+00 | loss scale: 65536.0 | grad norm: 57711.969 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.27 | iteration 17063/ 292968 | consumed samples: 34945024 | consumed tokens: 18528452608 | elapsed time per iteration (ms): 158785.0 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.779155E+00 | loss scale: 65536.0 | grad norm: 61489.050 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.33 | iteration 17064/ 292968 | consumed samples: 34947072 | consumed tokens: 18530500608 | elapsed time per iteration (ms): 158703.8 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.759706E+00 | loss scale: 65536.0 | grad norm: 94575.615 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.37 | iteration 17065/ 292968 | consumed samples: 34949120 | consumed tokens: 18532548608 | elapsed time per iteration (ms): 159371.3 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.773013E+00 | loss scale: 65536.0 | grad norm: 69340.768 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.02 | iteration 17066/ 292968 | consumed samples: 34951168 | consumed tokens: 18534596608 | elapsed time per iteration (ms): 158961.0 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.797102E+00 | loss scale: 65536.0 | grad norm: 96951.855 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.24 | iteration 17067/ 292968 | consumed samples: 34953216 | consumed tokens: 18536644608 | elapsed time per iteration (ms): 160254.1 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.812793E+00 | loss scale: 65536.0 | grad norm: 90376.460 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.57 | iteration 17068/ 292968 | consumed samples: 34955264 | consumed tokens: 18538692608 | elapsed time per iteration (ms): 158659.2 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.762567E+00 | loss scale: 65536.0 | grad norm: 100332.652 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.40 | iteration 17069/ 292968 | consumed samples: 34957312 | consumed tokens: 18540740608 | elapsed time per iteration (ms): 158799.6 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.767109E+00 | loss scale: 65536.0 | grad norm: 91946.853 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.32 | iteration 17070/ 292968 | consumed samples: 34959360 | consumed tokens: 18542788608 | elapsed time per iteration (ms): 166742.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.728962E+00 | loss scale: 65536.0 | grad norm: 78129.888 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 79.35 | iteration 17071/ 292968 | consumed samples: 34961408 | consumed tokens: 18544836608 | elapsed time per iteration (ms): 159376.5 | learning rate: 5.933E-05 | global batch size: 2048 | lm loss: 2.747668E+00 | loss scale: 65536.0 | grad norm: 80816.001 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.02 | iteration 17072/ 292968 | consumed samples: 34963456 | consumed tokens: 18546884608 | elapsed time per iteration (ms): 158369.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.724310E+00 | loss scale: 65536.0 | grad norm: 69163.402 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.55 | iteration 17073/ 292968 | consumed samples: 34965504 | consumed tokens: 18548932608 | elapsed time per iteration (ms): 158661.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.717522E+00 | loss scale: 65536.0 | grad norm: 69114.025 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.39 | iteration 17074/ 292968 | consumed samples: 34967552 | consumed tokens: 18550980608 | elapsed time per iteration (ms): 158710.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.735413E+00 | loss scale: 65536.0 | grad norm: 57908.332 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.37 | iteration 17075/ 292968 | consumed samples: 34969600 | consumed tokens: 18553028608 | elapsed time per iteration (ms): 158338.8 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.720541E+00 | loss scale: 65536.0 | grad norm: 79097.619 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.56 | iteration 17076/ 292968 | consumed samples: 34971648 | consumed tokens: 18555076608 | elapsed time per iteration (ms): 158493.5 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.708616E+00 | loss scale: 65536.0 | grad norm: 50589.381 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.48 | iteration 17077/ 292968 | consumed samples: 34973696 | consumed tokens: 18557124608 | elapsed time per iteration (ms): 160559.0 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.720264E+00 | loss scale: 65536.0 | grad norm: 57370.785 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.41 | iteration 17078/ 292968 | consumed samples: 34975744 | consumed tokens: 18559172608 | elapsed time per iteration (ms): 158651.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.730410E+00 | loss scale: 65536.0 | grad norm: 87958.390 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.40 | iteration 17079/ 292968 | consumed samples: 34977792 | consumed tokens: 18561220608 | elapsed time per iteration (ms): 158804.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.721580E+00 | loss scale: 65536.0 | grad norm: 52017.571 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.32 | iteration 17080/ 292968 | consumed samples: 34979840 | consumed tokens: 18563268608 | elapsed time per iteration (ms): 159766.3 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.722080E+00 | loss scale: 65536.0 | grad norm: 73480.596 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.82 | iteration 17081/ 292968 | consumed samples: 34981888 | consumed tokens: 18565316608 | elapsed time per iteration (ms): 158869.9 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.708000E+00 | loss scale: 65536.0 | grad norm: 62418.794 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.29 | iteration 17082/ 292968 | consumed samples: 34983936 | consumed tokens: 18567364608 | elapsed time per iteration (ms): 159275.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.724328E+00 | loss scale: 65536.0 | grad norm: 112796.758 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.07 | iteration 17083/ 292968 | consumed samples: 34985984 | consumed tokens: 18569412608 | elapsed time per iteration (ms): 163307.8 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.738617E+00 | loss scale: 65536.0 | grad norm: 55416.857 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.02 | iteration 17084/ 292968 | consumed samples: 34988032 | consumed tokens: 18571460608 | elapsed time per iteration (ms): 158661.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.740738E+00 | loss scale: 65536.0 | grad norm: 134201.144 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.39 | iteration 17085/ 292968 | consumed samples: 34990080 | consumed tokens: 18573508608 | elapsed time per iteration (ms): 158984.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.738824E+00 | loss scale: 65536.0 | grad norm: 65648.590 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.23 | iteration 17086/ 292968 | consumed samples: 34992128 | consumed tokens: 18575556608 | elapsed time per iteration (ms): 159242.1 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.756643E+00 | loss scale: 65536.0 | grad norm: 119608.453 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.09 | iteration 17087/ 292968 | consumed samples: 34994176 | consumed tokens: 18577604608 | elapsed time per iteration (ms): 158879.8 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.749526E+00 | loss scale: 65536.0 | grad norm: 99136.964 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.28 | iteration 17088/ 292968 | consumed samples: 34996224 | consumed tokens: 18579652608 | elapsed time per iteration (ms): 158406.8 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.741938E+00 | loss scale: 65536.0 | grad norm: 96175.975 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.53 | iteration 17089/ 292968 | consumed samples: 34998272 | consumed tokens: 18581700608 | elapsed time per iteration (ms): 158762.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.741143E+00 | loss scale: 65536.0 | grad norm: 70301.300 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.34 | iteration 17090/ 292968 | consumed samples: 35000320 | consumed tokens: 18583748608 | elapsed time per iteration (ms): 160245.3 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.705019E+00 | loss scale: 65536.0 | grad norm: 104355.172 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.57 | iteration 17091/ 292968 | consumed samples: 35002368 | consumed tokens: 18585796608 | elapsed time per iteration (ms): 159076.7 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.741728E+00 | loss scale: 65536.0 | grad norm: 62629.400 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.18 | iteration 17092/ 292968 | consumed samples: 35004416 | consumed tokens: 18587844608 | elapsed time per iteration (ms): 158892.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.750970E+00 | loss scale: 65536.0 | grad norm: 84822.725 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.27 | iteration 17093/ 292968 | consumed samples: 35006464 | consumed tokens: 18589892608 | elapsed time per iteration (ms): 159135.8 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.764606E+00 | loss scale: 65536.0 | grad norm: 60921.692 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.15 | iteration 17094/ 292968 | consumed samples: 35008512 | consumed tokens: 18591940608 | elapsed time per iteration (ms): 158683.9 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.711609E+00 | loss scale: 65536.0 | grad norm: 62879.881 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.38 | iteration 17095/ 292968 | consumed samples: 35010560 | consumed tokens: 18593988608 | elapsed time per iteration (ms): 158732.7 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.725972E+00 | loss scale: 65536.0 | grad norm: 65654.269 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.36 | iteration 17096/ 292968 | consumed samples: 35012608 | consumed tokens: 18596036608 | elapsed time per iteration (ms): 159337.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.726970E+00 | loss scale: 65536.0 | grad norm: 63703.326 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.04 | iteration 17097/ 292968 | consumed samples: 35014656 | consumed tokens: 18598084608 | elapsed time per iteration (ms): 158778.9 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.736814E+00 | loss scale: 65536.0 | grad norm: 66289.284 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.33 | iteration 17098/ 292968 | consumed samples: 35016704 | consumed tokens: 18600132608 | elapsed time per iteration (ms): 158987.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.740655E+00 | loss scale: 65536.0 | grad norm: 68931.628 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.22 | iteration 17099/ 292968 | consumed samples: 35018752 | consumed tokens: 18602180608 | elapsed time per iteration (ms): 159077.5 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.762902E+00 | loss scale: 65536.0 | grad norm: 81884.175 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.18 | iteration 17100/ 292968 | consumed samples: 35020800 | consumed tokens: 18604228608 | elapsed time per iteration (ms): 158618.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.737853E+00 | loss scale: 65536.0 | grad norm: 67240.064 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.42 | ------------------------------------------------------------------------------------------- valid loss at iteration 17100 | lm loss value: 3.107989E+00 | lm loss PPL: 2.237601E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 17100 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 20:49:35,837] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/mp_rank_00_model_states.pt [2022-02-06 20:49:35,979] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/mp_rank_01_model_states.pt [2022-02-06 20:49:50,369] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 20:49:50,799] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 20:49:51,067] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 20:49:52,034] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 20:49:52,078] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 20:49:52,401] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 20:49:52,864] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 20:49:53,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 20:49:53,553] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 20:49:53,795] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 20:49:53,818] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 20:49:53,909] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 20:49:53,984] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 20:49:54,017] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 20:49:54,060] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 20:49:54,106] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 20:49:54,109] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 20:49:54,131] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 20:49:54,618] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 20:49:55,283] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 20:49:55,308] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 20:49:55,455] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 20:49:55,481] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 20:49:55,681] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 20:49:57,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 20:49:57,571] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 20:49:57,730] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 20:49:57,745] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 20:49:58,290] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 20:49:58,513] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 20:49:59,008] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 20:49:59,196] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 20:49:59,229] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 20:49:59,776] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 20:50:00,045] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 20:50:00,075] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 20:50:00,348] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 20:50:00,349] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 20:50:00,654] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 20:50:00,977] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 20:50:00,986] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 20:50:01,134] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 20:50:01,574] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 20:50:01,877] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 20:50:01,961] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 20:50:02,450] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 20:50:02,578] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 20:50:02,640] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 20:50:02,649] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 20:50:03,026] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 20:50:03,423] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 20:50:03,736] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 20:50:03,786] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 20:50:04,285] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 20:50:04,370] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 20:50:04,670] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 20:50:04,695] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 20:50:04,952] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 20:50:05,232] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 20:50:05,406] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 20:50:05,478] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 20:50:06,657] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 20:50:07,037] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 20:50:08,147] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 20:50:08,556] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 20:50:08,625] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 20:50:09,045] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 20:50:09,082] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 20:50:09,126] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 20:50:09,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 20:50:09,261] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 20:50:09,445] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 20:50:09,485] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 20:50:10,033] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 20:50:10,084] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 20:50:10,100] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 20:50:10,175] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 20:50:10,297] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 20:50:10,324] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 20:50:10,464] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 20:50:10,468] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 20:50:10,482] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 20:50:10,719] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 20:50:10,724] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 20:50:10,910] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 20:50:10,925] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 20:50:10,945] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 20:50:10,979] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 20:50:11,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 20:50:11,104] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 20:50:11,153] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 20:50:11,163] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 20:50:11,255] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 20:50:11,303] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 20:50:11,323] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 20:50:11,334] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 20:50:11,571] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 20:50:11,592] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 20:50:11,644] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 20:50:11,652] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 20:50:11,664] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 20:50:11,777] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 20:50:11,920] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 20:50:12,053] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 20:50:12,141] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 20:50:12,309] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 20:50:12,382] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 20:50:12,453] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 20:50:12,463] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 20:50:12,601] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 20:50:12,652] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 20:50:12,800] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 20:50:12,870] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 20:50:12,896] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 20:50:13,247] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 20:50:13,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 20:50:13,411] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 20:50:16,924] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 20:50:17,105] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 20:50:16,952] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 20:50:17,162] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 20:50:17,370] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 20:50:18,519] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 20:50:18,581] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 20:50:18,820] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 20:50:18,882] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 20:50:20,258] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 20:50:20,327] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17100/zero_pp_rank_0_mp_rank_27_optim_states.pt successfully saved checkpoint at iteration 17100 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 49329.94 iteration 17101/ 292968 | consumed samples: 35022848 | consumed tokens: 18606276608 | elapsed time per iteration (ms): 663765.3 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.742996E+00 | loss scale: 65536.0 | grad norm: 52723.824 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 19.93 | iteration 17102/ 292968 | consumed samples: 35024896 | consumed tokens: 18608324608 | elapsed time per iteration (ms): 158747.3 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.711210E+00 | loss scale: 65536.0 | grad norm: 50669.109 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.35 | iteration 17103/ 292968 | consumed samples: 35026944 | consumed tokens: 18610372608 | elapsed time per iteration (ms): 158786.9 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.746686E+00 | loss scale: 65536.0 | grad norm: 38589.695 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.33 | iteration 17104/ 292968 | consumed samples: 35028992 | consumed tokens: 18612420608 | elapsed time per iteration (ms): 158831.7 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.721142E+00 | loss scale: 65536.0 | grad norm: 59925.095 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.31 | iteration 17105/ 292968 | consumed samples: 35031040 | consumed tokens: 18614468608 | elapsed time per iteration (ms): 159147.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.752824E+00 | loss scale: 65536.0 | grad norm: 61989.902 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.14 | iteration 17106/ 292968 | consumed samples: 35033088 | consumed tokens: 18616516608 | elapsed time per iteration (ms): 158738.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.733414E+00 | loss scale: 65536.0 | grad norm: 73562.095 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.35 | iteration 17107/ 292968 | consumed samples: 35035136 | consumed tokens: 18618564608 | elapsed time per iteration (ms): 158940.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.750814E+00 | loss scale: 65536.0 | grad norm: 73165.567 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.25 | iteration 17108/ 292968 | consumed samples: 35037184 | consumed tokens: 18620612608 | elapsed time per iteration (ms): 158442.7 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.717760E+00 | loss scale: 65536.0 | grad norm: 61483.036 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.51 | [2022-02-06 21:14:09,360] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 65536.0 iteration 17109/ 292968 | consumed samples: 35039232 | consumed tokens: 18622660608 | elapsed time per iteration (ms): 158873.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.728264E+00 | loss scale: 65536.0 | grad norm: 61483.036 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.28 | iteration 17110/ 292968 | consumed samples: 35041280 | consumed tokens: 18624708608 | elapsed time per iteration (ms): 158851.7 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.734252E+00 | loss scale: 65536.0 | grad norm: 86834.344 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.29 | iteration 17111/ 292968 | consumed samples: 35043328 | consumed tokens: 18626756608 | elapsed time per iteration (ms): 158671.3 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.742181E+00 | loss scale: 65536.0 | grad norm: 64825.865 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.39 | iteration 17112/ 292968 | consumed samples: 35045376 | consumed tokens: 18628804608 | elapsed time per iteration (ms): 158928.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.743757E+00 | loss scale: 65536.0 | grad norm: 70827.478 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.25 | iteration 17113/ 292968 | consumed samples: 35047424 | consumed tokens: 18630852608 | elapsed time per iteration (ms): 158660.9 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.731973E+00 | loss scale: 65536.0 | grad norm: 60129.230 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.39 | iteration 17114/ 292968 | consumed samples: 35049472 | consumed tokens: 18632900608 | elapsed time per iteration (ms): 159049.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.759015E+00 | loss scale: 65536.0 | grad norm: 84238.085 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.19 | iteration 17115/ 292968 | consumed samples: 35051520 | consumed tokens: 18634948608 | elapsed time per iteration (ms): 158937.3 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.756507E+00 | loss scale: 65536.0 | grad norm: 51152.171 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.25 | iteration 17116/ 292968 | consumed samples: 35053568 | consumed tokens: 18636996608 | elapsed time per iteration (ms): 158709.5 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.738040E+00 | loss scale: 65536.0 | grad norm: 79861.338 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.37 | iteration 17117/ 292968 | consumed samples: 35055616 | consumed tokens: 18639044608 | elapsed time per iteration (ms): 159035.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.740876E+00 | loss scale: 65536.0 | grad norm: 70821.189 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.20 | iteration 17118/ 292968 | consumed samples: 35057664 | consumed tokens: 18641092608 | elapsed time per iteration (ms): 159081.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.749951E+00 | loss scale: 65536.0 | grad norm: 52598.509 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.17 | iteration 17119/ 292968 | consumed samples: 35059712 | consumed tokens: 18643140608 | elapsed time per iteration (ms): 158898.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.738863E+00 | loss scale: 65536.0 | grad norm: 64565.612 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.27 | iteration 17120/ 292968 | consumed samples: 35061760 | consumed tokens: 18645188608 | elapsed time per iteration (ms): 159003.3 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.760359E+00 | loss scale: 65536.0 | grad norm: 77815.559 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.22 | iteration 17121/ 292968 | consumed samples: 35063808 | consumed tokens: 18647236608 | elapsed time per iteration (ms): 159146.4 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.749543E+00 | loss scale: 65536.0 | grad norm: 64922.373 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.14 | iteration 17122/ 292968 | consumed samples: 35065856 | consumed tokens: 18649284608 | elapsed time per iteration (ms): 158934.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.738429E+00 | loss scale: 65536.0 | grad norm: 63837.112 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.25 | iteration 17123/ 292968 | consumed samples: 35067904 | consumed tokens: 18651332608 | elapsed time per iteration (ms): 158860.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.730717E+00 | loss scale: 65536.0 | grad norm: 73948.361 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.29 | iteration 17124/ 292968 | consumed samples: 35069952 | consumed tokens: 18653380608 | elapsed time per iteration (ms): 158757.2 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.773272E+00 | loss scale: 65536.0 | grad norm: 65059.784 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.34 | iteration 17125/ 292968 | consumed samples: 35072000 | consumed tokens: 18655428608 | elapsed time per iteration (ms): 159515.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.736591E+00 | loss scale: 65536.0 | grad norm: 72721.478 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.95 | iteration 17126/ 292968 | consumed samples: 35074048 | consumed tokens: 18657476608 | elapsed time per iteration (ms): 158917.6 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.749594E+00 | loss scale: 65536.0 | grad norm: 62824.826 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.26 | iteration 17127/ 292968 | consumed samples: 35076096 | consumed tokens: 18659524608 | elapsed time per iteration (ms): 158882.0 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.765779E+00 | loss scale: 65536.0 | grad norm: 84469.767 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.28 | iteration 17128/ 292968 | consumed samples: 35078144 | consumed tokens: 18661572608 | elapsed time per iteration (ms): 159003.1 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.762455E+00 | loss scale: 65536.0 | grad norm: 45880.033 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.22 | iteration 17129/ 292968 | consumed samples: 35080192 | consumed tokens: 18663620608 | elapsed time per iteration (ms): 159563.7 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.744232E+00 | loss scale: 65536.0 | grad norm: 88862.871 | num zeros: 0.0 | curriculum seqlen: 1000 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.92 | iteration 17130/ 292968 | consumed samples: 35082240 | consumed tokens: 18665684992 | elapsed time per iteration (ms): 158865.1 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.732873E+00 | loss scale: 65536.0 | grad norm: 67041.015 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.95 | iteration 17131/ 292968 | consumed samples: 35084288 | consumed tokens: 18667749376 | elapsed time per iteration (ms): 158952.9 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.796029E+00 | loss scale: 65536.0 | grad norm: 62798.186 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.91 | iteration 17132/ 292968 | consumed samples: 35086336 | consumed tokens: 18669813760 | elapsed time per iteration (ms): 159243.5 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.780420E+00 | loss scale: 65536.0 | grad norm: 62550.637 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.75 | iteration 17133/ 292968 | consumed samples: 35088384 | consumed tokens: 18671878144 | elapsed time per iteration (ms): 159143.8 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.744705E+00 | loss scale: 65536.0 | grad norm: 72424.214 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.81 | iteration 17134/ 292968 | consumed samples: 35090432 | consumed tokens: 18673942528 | elapsed time per iteration (ms): 158447.5 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.758261E+00 | loss scale: 65536.0 | grad norm: 73680.789 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.18 | iteration 17135/ 292968 | consumed samples: 35092480 | consumed tokens: 18676006912 | elapsed time per iteration (ms): 159273.1 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.740274E+00 | loss scale: 65536.0 | grad norm: 62556.860 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.74 | iteration 17136/ 292968 | consumed samples: 35094528 | consumed tokens: 18678071296 | elapsed time per iteration (ms): 158608.0 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.757091E+00 | loss scale: 65536.0 | grad norm: 65301.629 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.09 | iteration 17137/ 292968 | consumed samples: 35096576 | consumed tokens: 18680135680 | elapsed time per iteration (ms): 158531.7 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.759795E+00 | loss scale: 65536.0 | grad norm: 71867.806 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.13 | iteration 17138/ 292968 | consumed samples: 35098624 | consumed tokens: 18682200064 | elapsed time per iteration (ms): 158733.1 | learning rate: 5.932E-05 | global batch size: 2048 | lm loss: 2.750237E+00 | loss scale: 65536.0 | grad norm: 66891.195 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.02 | iteration 17139/ 292968 | consumed samples: 35100672 | consumed tokens: 18684264448 | elapsed time per iteration (ms): 158505.3 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.770636E+00 | loss scale: 65536.0 | grad norm: 88310.481 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.14 | iteration 17140/ 292968 | consumed samples: 35102720 | consumed tokens: 18686328832 | elapsed time per iteration (ms): 158772.3 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.728343E+00 | loss scale: 65536.0 | grad norm: 49183.286 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.00 | iteration 17141/ 292968 | consumed samples: 35104768 | consumed tokens: 18688393216 | elapsed time per iteration (ms): 158559.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.766794E+00 | loss scale: 65536.0 | grad norm: 79350.559 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.12 | iteration 17142/ 292968 | consumed samples: 35106816 | consumed tokens: 18690457600 | elapsed time per iteration (ms): 158826.0 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.789883E+00 | loss scale: 65536.0 | grad norm: 69994.862 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.97 | iteration 17143/ 292968 | consumed samples: 35108864 | consumed tokens: 18692521984 | elapsed time per iteration (ms): 158487.4 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.766624E+00 | loss scale: 65536.0 | grad norm: 76851.211 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.15 | iteration 17144/ 292968 | consumed samples: 35110912 | consumed tokens: 18694586368 | elapsed time per iteration (ms): 158781.7 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.766554E+00 | loss scale: 65536.0 | grad norm: 75511.670 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.00 | iteration 17145/ 292968 | consumed samples: 35112960 | consumed tokens: 18696650752 | elapsed time per iteration (ms): 158675.4 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.776926E+00 | loss scale: 65536.0 | grad norm: 100124.039 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.05 | iteration 17146/ 292968 | consumed samples: 35115008 | consumed tokens: 18698715136 | elapsed time per iteration (ms): 158498.9 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.784070E+00 | loss scale: 65536.0 | grad norm: 58245.131 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.15 | iteration 17147/ 292968 | consumed samples: 35117056 | consumed tokens: 18700779520 | elapsed time per iteration (ms): 158791.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.784075E+00 | loss scale: 65536.0 | grad norm: 93550.816 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.99 | iteration 17148/ 292968 | consumed samples: 35119104 | consumed tokens: 18702843904 | elapsed time per iteration (ms): 159248.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.788463E+00 | loss scale: 65536.0 | grad norm: 83120.694 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.75 | iteration 17149/ 292968 | consumed samples: 35121152 | consumed tokens: 18704908288 | elapsed time per iteration (ms): 158837.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.803448E+00 | loss scale: 65536.0 | grad norm: 89794.392 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.97 | iteration 17150/ 292968 | consumed samples: 35123200 | consumed tokens: 18706972672 | elapsed time per iteration (ms): 158501.9 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.772990E+00 | loss scale: 65536.0 | grad norm: 79018.501 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.15 | saving checkpoint at iteration 17150 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 23:02:47,703] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/mp_rank_00_model_states.pt [2022-02-06 23:02:47,827] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/mp_rank_01_model_states.pt [2022-02-06 23:03:02,443] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 23:03:02,460] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 23:03:02,595] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 23:03:02,659] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 23:03:02,864] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 23:03:03,876] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 23:03:04,604] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 23:03:04,705] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 23:03:04,785] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 23:03:04,950] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 23:03:04,997] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 23:03:05,270] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 23:03:05,387] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 23:03:05,406] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 23:03:05,672] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 23:03:05,746] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 23:03:05,751] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 23:03:06,295] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 23:03:06,367] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 23:03:06,878] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 23:03:07,287] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 23:03:07,405] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 23:03:07,407] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 23:03:07,430] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 23:03:09,378] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 23:03:09,479] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 23:03:10,848] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 23:03:11,624] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 23:03:12,521] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 23:03:12,502] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 23:03:12,709] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 23:03:13,211] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 23:03:13,224] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 23:03:13,867] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 23:03:14,197] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 23:03:14,278] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 23:03:14,289] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 23:03:14,340] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 23:03:14,620] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 23:03:14,695] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 23:03:14,909] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 23:03:14,963] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 23:03:15,141] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 23:03:15,187] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 23:03:15,216] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 23:03:15,394] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-06 23:03:15,426] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 23:03:15,570] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 23:03:15,707] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 23:03:15,775] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 23:03:15,841] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 23:03:16,044] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 23:03:16,167] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 23:03:16,189] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 23:03:16,210] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 23:03:16,292] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 23:03:16,346] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 23:03:16,400] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 23:03:16,413] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 23:03:16,423] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 23:03:16,500] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 23:03:16,490] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 23:03:16,543] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 23:03:16,595] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 23:03:16,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 23:03:16,710] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 23:03:16,777] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 23:03:16,881] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 23:03:16,926] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 23:03:16,930] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 23:03:16,958] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 23:03:17,034] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 23:03:17,136] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 23:03:17,146] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 23:03:17,172] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 23:03:17,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 23:03:17,346] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 23:03:17,394] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 23:03:17,466] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 23:03:17,745] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 23:03:18,246] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 23:03:18,297] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 23:03:18,407] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 23:03:18,456] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 23:03:18,498] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 23:03:18,707] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 23:03:18,970] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 23:03:19,037] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 23:03:19,703] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 23:03:19,765] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 23:03:20,171] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 23:03:20,272] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 23:03:20,282] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 23:03:20,734] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 23:03:20,823] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 23:03:21,044] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 23:03:21,189] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 23:03:21,341] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 23:03:22,340] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 23:03:22,364] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 23:03:22,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 23:03:22,650] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 23:03:22,769] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 23:03:23,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 23:03:23,467] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 23:03:23,846] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 23:03:24,002] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 23:03:24,065] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 23:03:24,087] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 23:03:24,158] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 23:03:24,198] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 23:03:24,338] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 23:03:24,545] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 23:03:24,633] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 23:03:24,741] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 23:03:25,209] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 23:03:25,250] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 23:03:25,565] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 23:03:25,603] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 23:03:27,052] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 23:03:29,048] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 23:03:29,189] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 23:03:33,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 23:03:33,851] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 23:03:35,407] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 23:03:35,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 23:03:36,879] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 23:03:37,017] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17150/zero_pp_rank_0_mp_rank_123_optim_states.pt successfully saved checkpoint at iteration 17150 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 53959.97 iteration 17151/ 292968 | consumed samples: 35125248 | consumed tokens: 18709037056 | elapsed time per iteration (ms): 212063.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.786308E+00 | loss scale: 65536.0 | grad norm: 86705.962 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 62.89 | iteration 17152/ 292968 | consumed samples: 35127296 | consumed tokens: 18711101440 | elapsed time per iteration (ms): 158386.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.799742E+00 | loss scale: 65536.0 | grad norm: 55222.880 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.21 | iteration 17153/ 292968 | consumed samples: 35129344 | consumed tokens: 18713165824 | elapsed time per iteration (ms): 158180.0 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.800851E+00 | loss scale: 65536.0 | grad norm: 113633.649 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.32 | iteration 17154/ 292968 | consumed samples: 35131392 | consumed tokens: 18715230208 | elapsed time per iteration (ms): 158455.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.757282E+00 | loss scale: 65536.0 | grad norm: 74861.999 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.17 | iteration 17155/ 292968 | consumed samples: 35133440 | consumed tokens: 18717294592 | elapsed time per iteration (ms): 158772.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.794053E+00 | loss scale: 65536.0 | grad norm: 74091.739 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.00 | iteration 17156/ 292968 | consumed samples: 35135488 | consumed tokens: 18719358976 | elapsed time per iteration (ms): 158448.4 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.785101E+00 | loss scale: 65536.0 | grad norm: 172889.464 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.17 | iteration 17157/ 292968 | consumed samples: 35137536 | consumed tokens: 18721423360 | elapsed time per iteration (ms): 158807.6 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.824795E+00 | loss scale: 65536.0 | grad norm: 83490.366 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.98 | iteration 17158/ 292968 | consumed samples: 35139584 | consumed tokens: 18723487744 | elapsed time per iteration (ms): 158770.3 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.806708E+00 | loss scale: 65536.0 | grad norm: 122617.681 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.00 | iteration 17159/ 292968 | consumed samples: 35141632 | consumed tokens: 18725552128 | elapsed time per iteration (ms): 158831.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.785352E+00 | loss scale: 65536.0 | grad norm: 80404.115 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.97 | iteration 17160/ 292968 | consumed samples: 35143680 | consumed tokens: 18727616512 | elapsed time per iteration (ms): 158971.2 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.795693E+00 | loss scale: 65536.0 | grad norm: 95802.694 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.90 | iteration 17161/ 292968 | consumed samples: 35145728 | consumed tokens: 18729680896 | elapsed time per iteration (ms): 159960.6 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.807747E+00 | loss scale: 65536.0 | grad norm: 76916.583 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.38 | iteration 17162/ 292968 | consumed samples: 35147776 | consumed tokens: 18731745280 | elapsed time per iteration (ms): 160247.0 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.821066E+00 | loss scale: 65536.0 | grad norm: 83578.203 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.23 | iteration 17163/ 292968 | consumed samples: 35149824 | consumed tokens: 18733809664 | elapsed time per iteration (ms): 159640.0 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.830394E+00 | loss scale: 65536.0 | grad norm: 92451.260 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.55 | saving checkpoint at iteration 17163 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-06 23:38:08,145] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/mp_rank_01_model_states.pt [2022-02-06 23:38:08,325] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/mp_rank_00_model_states.pt [2022-02-06 23:38:21,169] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-06 23:38:22,022] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-06 23:38:21,986] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-06 23:38:22,176] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-06 23:38:23,120] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-06 23:38:23,794] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-06 23:38:23,928] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-06 23:38:23,932] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-06 23:38:24,538] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-06 23:38:24,559] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-06 23:38:24,627] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-06 23:38:24,719] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-06 23:38:26,071] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-06 23:38:26,148] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-06 23:38:26,170] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-06 23:38:26,189] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-06 23:38:26,263] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-06 23:38:27,598] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-06 23:38:27,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-06 23:38:28,471] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-06 23:38:28,479] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-06 23:38:28,488] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-06 23:38:28,743] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-06 23:38:28,743] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-06 23:38:28,781] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-06 23:38:28,834] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-06 23:38:29,021] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-06 23:38:29,043] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-06 23:38:29,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-06 23:38:29,805] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-06 23:38:30,312] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-06 23:38:30,493] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-06 23:38:31,046] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-06 23:38:31,139] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-06 23:38:31,320] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-06 23:38:31,602] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-06 23:38:31,911] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-06 23:38:31,973] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-06 23:38:31,992] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-06 23:38:32,401] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-06 23:38:32,502] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-06 23:38:32,962] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-06 23:38:33,225] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-06 23:38:33,335] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-06 23:38:33,523] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-06 23:38:33,593] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-06 23:38:33,635] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-06 23:38:33,678] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-06 23:38:33,762] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-06 23:38:33,778] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-06 23:38:33,935] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-06 23:38:34,014] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-06 23:38:34,414] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-06 23:38:34,621] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-06 23:38:34,919] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-06 23:38:35,344] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-06 23:38:35,459] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-06 23:38:36,102] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-06 23:38:36,167] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-06 23:38:36,365] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-06 23:38:36,429] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-06 23:38:36,672] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-06 23:38:36,709] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-06 23:38:37,218] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-06 23:38:37,329] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-06 23:38:37,248] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-06 23:38:37,313] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-06 23:38:37,378] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-06 23:38:37,449] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-06 23:38:37,694] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-06 23:38:37,729] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-06 23:38:37,752] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-06 23:38:37,735] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-06 23:38:37,889] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-06 23:38:38,155] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-06 23:38:38,453] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-06 23:38:38,685] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-06 23:38:38,716] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-06 23:38:39,610] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-06 23:38:39,628] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-06 23:38:39,637] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-06 23:38:39,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-06 23:38:39,957] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-06 23:38:40,011] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-06 23:38:40,118] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-06 23:38:40,233] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-06 23:38:40,276] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-06 23:38:40,297] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-06 23:38:41,012] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-06 23:38:41,256] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-06 23:38:41,386] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-06 23:38:41,412] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-06 23:38:41,433] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-06 23:38:41,462] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-06 23:38:41,506] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-06 23:38:41,586] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-06 23:38:41,674] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-06 23:38:41,712] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-06 23:38:41,843] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-06 23:38:41,925] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-06 23:38:41,987] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-06 23:38:42,069] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-06 23:38:42,106] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-06 23:38:42,213] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-06 23:38:42,237] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-06 23:38:43,008] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-06 23:38:43,255] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-06 23:38:43,269] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-06 23:38:43,296] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-06 23:38:43,415] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-06 23:38:43,453] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-06 23:38:43,466] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-06 23:38:43,476] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-06 23:38:43,774] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-06 23:38:43,992] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-06 23:38:46,377] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-06 23:38:46,688] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-06 23:38:47,021] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-06 23:38:47,883] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-06 23:38:48,281] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-06 23:38:48,510] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-06 23:38:48,564] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-06 23:38:49,548] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-06 23:38:49,680] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-06 23:38:53,983] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-06 23:38:54,017] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-06 23:38:57,306] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-06 23:38:57,389] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17163/zero_pp_rank_0_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 17163 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 54798.13 [exiting program after 1187.3764813025794 minutes] datetime: 2022-02-06 23:38:57 /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1790286.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] > setting tensorboard ...  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.10.2 torch cuda version ............... 11.3 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+ba9c4cc7, ba9c4cc7, master deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-06 23:39:29,471] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.125 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 8.129 seconds time to initialize megatron (seconds): 41.726 [after megatron is initialized] datetime: 2022-02-06 23:39:37 building GPT model ... [2022-02-06 23:39:37,763] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-06 23:39:37,764] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-06 23:39:37,764] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.5 GB, percent = 9.8% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-06 23:39:39,477] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-06 23:39:40,103] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-06 23:39:40,103] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-06 23:39:40,103] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.95 GB, percent = 9.9% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-06 23:39:40,220] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+ba9c4cc7, git-hash=ba9c4cc7, git-branch=master [2022-02-06 23:39:41,036] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-06 23:39:41,037] [INFO] [engine.py:1099:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-06 23:39:41,037] [INFO] [engine.py:1105:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-06 23:39:41,037] [INFO] [engine.py:1121:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-06 23:39:41,037] [INFO] [utils.py:48:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-06 23:39:41,037] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-06 23:39:41,037] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-06 23:39:41,037] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-06 23:39:41,037] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-06 23:39:41,037] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] [2022-02-06 23:39:45,684] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-06 23:39:45,685] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-06 23:39:45,685] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.96 GB, percent = 9.9% [2022-02-06 23:39:45,760] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-06 23:39:45,760] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-06 23:39:45,761] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.96 GB, percent = 9.9% [2022-02-06 23:39:45,761] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-06 23:39:45,783] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-06 23:39:45,783] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-06 23:39:45,783] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 49.96 GB, percent = 9.9% [2022-02-06 23:39:45,783] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-06 23:39:45,783] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-06 23:39:45,783] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-06 23:39:45,783] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-06 23:39:45,783] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-06 23:39:45,784] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-06 23:39:45,785] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-06 23:39:45,786] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_16bit_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-06 23:39:45,786] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-06 23:39:45,786] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-06 23:39:45,786] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-06 23:39:45,786] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,122] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-06 23:39:48,123] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-06 23:40:13,828] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-06 23:40:14,186] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-06 23:40:15,253] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-06 23:40:15,268] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-06 23:40:15,332] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-06 23:40:15,385] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-06 23:40:15,439] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-06 23:40:15,458] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-06 23:40:15,597] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-06 23:40:15,780] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-06 23:40:15,792] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-06 23:40:15,838] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-06 23:40:15,839] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-06 23:40:15,885] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-06 23:40:15,901] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-06 23:40:16,274] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-06 23:40:16,697] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-06 23:40:16,761] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-06 23:40:16,761] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-06 23:40:16,840] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-06 23:40:16,877] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-02-06 23:40:16,885] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-06 23:40:16,940] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-06 23:40:16,992] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-06 23:40:17,082] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-06 23:40:17,106] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-06 23:40:17,170] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-06 23:40:17,180] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-06 23:40:17,275] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-06 23:40:17,279] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-06 23:40:17,331] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-06 23:40:17,350] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-06 23:40:17,361] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-06 23:40:17,421] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-06 23:40:17,610] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-06 23:40:17,700] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-06 23:40:17,994] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-06 23:40:18,009] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-06 23:40:18,065] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-06 23:40:18,173] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-06 23:40:18,343] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-06 23:40:18,445] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-06 23:40:18,454] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-06 23:40:18,487] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-06 23:40:18,489] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-06 23:40:18,606] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-06 23:40:18,669] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-06 23:40:18,694] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-06 23:40:18,697] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-06 23:40:18,702] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-06 23:40:18,723] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-06 23:40:18,729] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-06 23:40:18,756] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-06 23:40:18,887] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-06 23:40:19,005] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-06 23:40:19,046] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-06 23:40:19,082] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-06 23:40:19,233] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-06 23:40:19,266] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-06 23:40:19,273] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-06 23:40:19,296] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-06 23:40:19,339] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-06 23:40:19,413] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-06 23:40:19,450] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-06 23:40:19,457] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-06 23:40:19,502] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-06 23:40:19,553] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-06 23:40:19,561] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-06 23:40:19,610] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-06 23:40:19,621] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-06 23:40:19,646] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-06 23:40:19,786] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-06 23:40:19,796] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-06 23:40:19,804] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-06 23:40:19,815] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 [2022-02-06 23:40:20,026] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-06 23:40:20,122] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-06 23:40:20,137] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-06 23:40:20,180] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-06 23:40:20,183] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-06 23:40:20,240] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-06 23:40:20,266] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-06 23:40:20,271] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-06 23:40:20,312] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-06 23:40:20,339] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-06 23:40:20,548] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-06 23:40:20,620] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-06 23:40:20,628] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-06 23:40:20,710] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-06 23:40:20,721] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-06 23:40:20,750] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-06 23:40:20,751] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-06 23:40:20,755] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-06 23:40:20,867] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-06 23:40:20,896] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-06 23:40:20,906] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-06 23:40:20,929] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-06 23:40:20,956] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-06 23:40:21,049] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-06 23:40:21,073] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-06 23:40:21,119] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-06 23:40:21,136] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-06 23:40:21,187] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-06 23:40:21,208] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-06 23:40:21,230] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-06 23:40:21,281] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-06 23:40:21,285] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-06 23:40:21,289] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-06 23:40:21,318] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-06 23:40:21,362] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-06 23:40:21,433] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-06 23:40:21,467] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-06 23:40:21,533] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-06 23:40:21,546] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-06 23:40:21,554] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-06 23:40:21,585] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-06 23:40:21,711] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-06 23:40:21,715] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-06 23:40:21,733] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-06 23:40:21,740] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-06 23:40:21,883] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-06 23:40:21,892] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-06 23:40:21,893] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-06 23:40:21,936] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-06 23:40:21,940] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-06 23:40:22,013] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-06 23:40:22,071] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-06 23:40:22,118] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-06 23:40:22,165] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-06 23:40:22,180] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-06 23:40:22,183] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-06 23:40:22,191] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-06 23:40:22,235] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-06 23:40:22,244] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-06 23:40:22,260] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-06 23:40:22,276] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-06 23:40:22,298] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-06 23:40:22,318] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-06 23:40:22,364] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-06 23:40:22,368] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-06 23:40:22,383] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-06 23:40:22,394] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-06 23:40:22,395] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-02-06 23:40:22,401] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-06 23:40:22,418] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-06 23:40:22,475] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-06 23:40:22,531] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-06 23:40:22,574] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-06 23:40:22,593] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-06 23:40:22,609] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-06 23:40:22,614] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-06 23:40:22,688] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-06 23:40:22,705] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-06 23:40:22,713] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-06 23:40:22,723] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-06 23:40:22,738] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-06 23:40:22,752] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-06 23:40:22,780] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-06 23:40:22,798] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-06 23:40:22,803] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-06 23:40:22,906] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-06 23:40:22,925] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-06 23:40:22,976] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-06 23:40:23,010] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-06 23:40:23,019] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-06 23:40:23,086] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-06 23:40:23,158] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-06 23:40:23,211] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-06 23:40:23,229] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-06 23:40:23,264] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-06 23:40:23,274] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-06 23:40:23,274] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-06 23:40:23,281] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-06 23:40:23,281] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-06 23:40:23,282] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-06 23:40:23,287] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-06 23:40:23,297] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-06 23:40:23,337] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-06 23:40:23,348] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-06 23:40:23,368] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-06 23:40:23,376] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-06 23:40:23,385] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-06 23:40:23,389] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-06 23:40:23,429] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-06 23:40:23,431] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-06 23:40:23,491] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-06 23:40:23,510] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-06 23:40:23,534] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-06 23:40:23,536] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-06 23:40:23,569] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-06 23:40:23,570] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-06 23:40:23,571] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-06 23:40:23,608] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-06 23:40:23,615] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-06 23:40:23,624] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-06 23:40:23,667] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-06 23:40:23,701] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-06 23:40:23,835] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-06 23:40:23,836] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-06 23:40:23,847] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-06 23:40:23,915] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-06 23:40:23,922] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-06 23:40:23,945] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-06 23:40:23,953] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-06 23:40:23,955] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-06 23:40:23,959] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-06 23:40:23,963] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-06 23:40:23,997] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-06 23:40:24,014] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-06 23:40:24,024] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-06 23:40:24,041] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-06 23:40:24,059] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-06 23:40:24,104] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-06 23:40:24,184] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-06 23:40:24,211] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-06 23:40:24,247] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-06 23:40:24,300] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-06 23:40:24,307] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-06 23:40:24,327] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-06 23:40:24,340] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-06 23:40:24,394] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-06 23:40:24,404] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-02-06 23:40:24,533] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-06 23:40:24,612] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-02-06 23:40:24,621] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-06 23:40:24,639] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-06 23:40:24,711] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-06 23:40:24,740] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-06 23:40:24,751] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-06 23:40:24,756] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-06 23:40:24,776] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-06 23:40:24,783] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-06 23:40:24,804] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-06 23:40:24,854] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-06 23:40:24,898] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-06 23:40:24,932] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-06 23:40:25,002] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-06 23:40:25,058] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-06 23:40:25,064] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-06 23:40:25,118] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-06 23:40:25,144] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-06 23:40:25,148] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-02-06 23:40:25,204] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-06 23:40:25,281] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-06 23:40:25,327] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-02-06 23:40:25,355] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-06 23:40:25,364] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-06 23:40:25,394] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-06 23:40:25,553] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-06 23:40:25,608] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-06 23:40:25,774] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-06 23:40:25,899] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-06 23:40:26,060] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-06 23:40:26,093] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-06 23:40:26,136] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-06 23:40:26,433] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 17163 time (ms) | load-checkpoint: 37029.92 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-06 23:40:26 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.125977 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.224 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.151 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.083 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-06 23:40:34 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 48720.31 | train/valid/test-data-iterators-setup: 6886.01 [001-001] 103.3651B / 103.3651B [003-001] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B [001-030] 103.3651B / 103.3651B[002-030] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B [003-031] 125.2273B / 103.3710B [002-001] 103.3651B / 103.3651B [003-028] 103.3651B / 103.3651B[002-029] 103.3651B / 103.3651B[001-029] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B [002-008] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B[001-009] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B[002-016] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B [001-018] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B[003-010] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B [003-029] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B[002-009] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B [003-006] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B[002-014] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B[001-004] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [003-027] 103.3651B / 103.3651B [003-023] 103.3651B / 103.3651B [001-022] 103.3651B / 103.3651B[003-022] 103.3651B / 103.3651B [003-030] 103.3651B / 103.3651B [003-000] 125.2243B / 103.3681B [001-016] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [001-015] 103.3651B / 103.3651B[002-015] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B [003-002] 103.3651B / 103.3651B[002-002] 103.3651B / 103.3651B [002-024] 103.3651B / 103.3651B[003-025] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B[003-018] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B[003-011] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B[002-013] 103.3651B / 103.3651B[001-012] 103.3651B / 103.3651B [001-013] 103.3651B / 103.3651B [003-012] 103.3651B / 103.3651B[002-012] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B[001-021] 103.3651B / 103.3651B [002-031] 125.2273B / 103.3710B [003-017] 103.3651B / 103.3651B[001-017] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [002-007] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B[001-003] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B [001-024] 103.3651B / 103.3651B [003-024] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [002-005] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B [001-010] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B [000-008] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [003-003] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B[001-027] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B[000-017] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-007] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-029] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-02-06 23:40:34 [2022-02-06 23:40:34,079] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-06 23:40:34,079] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-06 23:40:34,079] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-06 23:40:34,079] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-06 23:40:34,079] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 124] (after 17164 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 126] (after 17164 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20726.50439453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 122] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 123] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 iteration 17164/ 292968 | consumed samples: 35151872 | consumed tokens: 18735874048 | elapsed time per iteration (ms): 241215.4 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.788774E+00 | loss scale: 65536.0 | grad norm: 55709.630 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.008 | TFLOPs: 55.29 | [Rank 127] (after 17164 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 0] (after 17164 iterations) memory (MB) | allocated: 13207.3203125 | max allocated: 20670.9365234375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 4] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 8] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 20] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16958.53662109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 24] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 125] (after 17164 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20726.50439453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 6] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 17164 iterations) memory (MB) | allocated: 13207.3203125 | max allocated: 20670.9365234375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 10] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 66] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 58] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 86] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 94] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 118] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 7] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 11] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 3] (after 17164 iterations) memory (MB) | allocated: 13207.3203125 | max allocated: 20670.9365234375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 35] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 1] (after 17164 iterations) memory (MB) | allocated: 13208.896484375 | max allocated: 20672.5126953125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 13] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 9] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16958.48095703125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 25] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 5] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 49] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 17164 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 iteration 17165/ 292968 | consumed samples: 35153920 | consumed tokens: 18737938432 | elapsed time per iteration (ms): 165236.0 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.810886E+00 | loss scale: 65536.0 | grad norm: 129621.698 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.72 | iteration 17166/ 292968 | consumed samples: 35155968 | consumed tokens: 18740002816 | elapsed time per iteration (ms): 162389.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.815834E+00 | loss scale: 65536.0 | grad norm: 65425.316 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.13 | iteration 17167/ 292968 | consumed samples: 35158016 | consumed tokens: 18742067200 | elapsed time per iteration (ms): 161253.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.808553E+00 | loss scale: 65536.0 | grad norm: 120450.138 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.71 | iteration 17168/ 292968 | consumed samples: 35160064 | consumed tokens: 18744131584 | elapsed time per iteration (ms): 160452.6 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.833791E+00 | loss scale: 65536.0 | grad norm: 89695.279 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.12 | iteration 17169/ 292968 | consumed samples: 35162112 | consumed tokens: 18746195968 | elapsed time per iteration (ms): 159826.9 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.792891E+00 | loss scale: 65536.0 | grad norm: 68969.652 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.45 | iteration 17170/ 292968 | consumed samples: 35164160 | consumed tokens: 18748260352 | elapsed time per iteration (ms): 159126.2 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.804898E+00 | loss scale: 65536.0 | grad norm: 86067.638 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.82 | iteration 17171/ 292968 | consumed samples: 35166208 | consumed tokens: 18750324736 | elapsed time per iteration (ms): 159676.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.780399E+00 | loss scale: 65536.0 | grad norm: 64061.467 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.53 | iteration 17172/ 292968 | consumed samples: 35168256 | consumed tokens: 18752389120 | elapsed time per iteration (ms): 159279.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.804581E+00 | loss scale: 65536.0 | grad norm: 58598.656 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.74 | iteration 17173/ 292968 | consumed samples: 35170304 | consumed tokens: 18754453504 | elapsed time per iteration (ms): 159024.7 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.800076E+00 | loss scale: 65536.0 | grad norm: 89479.377 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.87 | iteration 17174/ 292968 | consumed samples: 35172352 | consumed tokens: 18756517888 | elapsed time per iteration (ms): 158950.6 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.820457E+00 | loss scale: 65536.0 | grad norm: 81284.704 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.91 | iteration 17175/ 292968 | consumed samples: 35174400 | consumed tokens: 18758582272 | elapsed time per iteration (ms): 159321.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.784379E+00 | loss scale: 65536.0 | grad norm: 77517.493 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.71 | iteration 17176/ 292968 | consumed samples: 35176448 | consumed tokens: 18760646656 | elapsed time per iteration (ms): 159210.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.794291E+00 | loss scale: 65536.0 | grad norm: 84755.356 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.77 | iteration 17177/ 292968 | consumed samples: 35178496 | consumed tokens: 18762711040 | elapsed time per iteration (ms): 158915.0 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.775538E+00 | loss scale: 65536.0 | grad norm: 73789.282 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.93 | iteration 17178/ 292968 | consumed samples: 35180544 | consumed tokens: 18764775424 | elapsed time per iteration (ms): 158521.2 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.811158E+00 | loss scale: 65536.0 | grad norm: 63067.584 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.14 | iteration 17179/ 292968 | consumed samples: 35182592 | consumed tokens: 18766839808 | elapsed time per iteration (ms): 159551.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.804392E+00 | loss scale: 65536.0 | grad norm: 100771.278 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.59 | iteration 17180/ 292968 | consumed samples: 35184640 | consumed tokens: 18768904192 | elapsed time per iteration (ms): 159286.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.785079E+00 | loss scale: 65536.0 | grad norm: 46824.218 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.73 | iteration 17181/ 292968 | consumed samples: 35186688 | consumed tokens: 18770968576 | elapsed time per iteration (ms): 158660.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.828054E+00 | loss scale: 65536.0 | grad norm: 98633.936 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.06 | iteration 17182/ 292968 | consumed samples: 35188736 | consumed tokens: 18773032960 | elapsed time per iteration (ms): 158818.9 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.811841E+00 | loss scale: 65536.0 | grad norm: 66556.448 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.98 | iteration 17183/ 292968 | consumed samples: 35190784 | consumed tokens: 18775097344 | elapsed time per iteration (ms): 159248.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.813796E+00 | loss scale: 65536.0 | grad norm: 64189.952 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.75 | iteration 17184/ 292968 | consumed samples: 35192832 | consumed tokens: 18777161728 | elapsed time per iteration (ms): 159094.4 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.780782E+00 | loss scale: 65536.0 | grad norm: 95144.359 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.83 | iteration 17185/ 292968 | consumed samples: 35194880 | consumed tokens: 18779226112 | elapsed time per iteration (ms): 159473.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.809792E+00 | loss scale: 65536.0 | grad norm: 84228.229 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.63 | iteration 17186/ 292968 | consumed samples: 35196928 | consumed tokens: 18781290496 | elapsed time per iteration (ms): 158906.7 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.819648E+00 | loss scale: 65536.0 | grad norm: 69118.323 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.93 | iteration 17187/ 292968 | consumed samples: 35198976 | consumed tokens: 18783354880 | elapsed time per iteration (ms): 159585.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.821316E+00 | loss scale: 65536.0 | grad norm: 71374.788 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.57 | iteration 17188/ 292968 | consumed samples: 35201024 | consumed tokens: 18785419264 | elapsed time per iteration (ms): 158847.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.788062E+00 | loss scale: 65536.0 | grad norm: 63705.733 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.96 | [2022-02-07 00:51:06,901] [INFO] [stage_1_and_2.py:1648:step] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 iteration 17189/ 292968 | consumed samples: 35203072 | consumed tokens: 18787483648 | elapsed time per iteration (ms): 158956.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.814483E+00 | loss scale: 32768.0 | grad norm: 63705.733 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.91 | iteration 17190/ 292968 | consumed samples: 35205120 | consumed tokens: 18789548032 | elapsed time per iteration (ms): 158751.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.813210E+00 | loss scale: 32768.0 | grad norm: 51520.312 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.01 | iteration 17191/ 292968 | consumed samples: 35207168 | consumed tokens: 18791612416 | elapsed time per iteration (ms): 158642.6 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.860636E+00 | loss scale: 32768.0 | grad norm: 36541.542 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.07 | iteration 17192/ 292968 | consumed samples: 35209216 | consumed tokens: 18793676800 | elapsed time per iteration (ms): 158388.4 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.838365E+00 | loss scale: 32768.0 | grad norm: 47664.220 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.21 | iteration 17193/ 292968 | consumed samples: 35211264 | consumed tokens: 18795741184 | elapsed time per iteration (ms): 158474.3 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.804365E+00 | loss scale: 32768.0 | grad norm: 39115.039 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.16 | iteration 17194/ 292968 | consumed samples: 35213312 | consumed tokens: 18797805568 | elapsed time per iteration (ms): 158967.5 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.807918E+00 | loss scale: 32768.0 | grad norm: 44843.965 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.90 | iteration 17195/ 292968 | consumed samples: 35215360 | consumed tokens: 18799869952 | elapsed time per iteration (ms): 158640.7 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.844936E+00 | loss scale: 32768.0 | grad norm: 27827.699 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.07 | iteration 17196/ 292968 | consumed samples: 35217408 | consumed tokens: 18801934336 | elapsed time per iteration (ms): 158983.2 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.846709E+00 | loss scale: 32768.0 | grad norm: 60917.720 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.89 | iteration 17197/ 292968 | consumed samples: 35219456 | consumed tokens: 18803998720 | elapsed time per iteration (ms): 159271.2 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.847939E+00 | loss scale: 32768.0 | grad norm: 32317.993 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.74 | iteration 17198/ 292968 | consumed samples: 35221504 | consumed tokens: 18806063104 | elapsed time per iteration (ms): 158824.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.852970E+00 | loss scale: 32768.0 | grad norm: 48496.564 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.98 | iteration 17199/ 292968 | consumed samples: 35223552 | consumed tokens: 18808127488 | elapsed time per iteration (ms): 158777.0 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.829903E+00 | loss scale: 32768.0 | grad norm: 28886.150 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.00 | iteration 17200/ 292968 | consumed samples: 35225600 | consumed tokens: 18810191872 | elapsed time per iteration (ms): 158349.6 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.847791E+00 | loss scale: 32768.0 | grad norm: 48787.716 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.23 | saving checkpoint at iteration 17200 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-07 01:20:18,128] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/mp_rank_01_model_states.pt [2022-02-07 01:20:18,186] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/mp_rank_00_model_states.pt [2022-02-07 01:20:38,733] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-07 01:20:39,084] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-07 01:20:39,457] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-07 01:20:39,487] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-07 01:20:39,657] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-07 01:20:39,763] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-07 01:20:39,880] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-07 01:20:39,994] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-07 01:20:40,229] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-07 01:20:40,446] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-07 01:20:40,646] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-07 01:20:41,602] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-07 01:20:41,795] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-07 01:20:41,960] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-07 01:20:43,058] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-07 01:20:43,287] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-07 01:20:43,353] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-07 01:20:43,465] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-07 01:20:43,510] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-07 01:20:43,663] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-07 01:20:43,763] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-07 01:20:43,847] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-07 01:20:44,096] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-07 01:20:44,096] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-07 01:20:44,357] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-07 01:20:44,597] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-07 01:20:44,608] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-07 01:20:44,656] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-07 01:20:44,761] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-07 01:20:44,641] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-07 01:20:44,927] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-07 01:20:44,961] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-07 01:20:45,088] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-07 01:20:45,174] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-07 01:20:45,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-07 01:20:45,232] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-07 01:20:45,173] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-07 01:20:45,300] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-07 01:20:45,530] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-07 01:20:45,723] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-07 01:20:45,948] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-07 01:20:45,995] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-07 01:20:46,107] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-07 01:20:46,129] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-07 01:20:46,157] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-07 01:20:46,227] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-07 01:20:46,285] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-07 01:20:46,267] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-07 01:20:46,340] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-07 01:20:46,588] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-07 01:20:46,586] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-07 01:20:46,901] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-07 01:20:47,212] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-07 01:20:47,304] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-07 01:20:47,392] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-07 01:20:47,412] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-07 01:20:47,635] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-07 01:20:47,749] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-07 01:20:47,770] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-07 01:20:47,923] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-07 01:20:48,012] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-07 01:20:48,274] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-07 01:20:48,322] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-07 01:20:48,459] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-07 01:20:48,476] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-07 01:20:48,505] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-07 01:20:48,628] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-07 01:20:48,666] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-07 01:20:48,666] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-07 01:20:48,656] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-07 01:20:48,722] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-07 01:20:48,944] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-07 01:20:49,001] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-07 01:20:49,176] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-07 01:20:49,204] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-07 01:20:49,232] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-07 01:20:49,427] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-07 01:20:49,984] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-07 01:20:50,579] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-07 01:20:50,675] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-07 01:20:51,227] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-07 01:20:51,423] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-07 01:20:51,514] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-07 01:20:51,567] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-07 01:20:51,859] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-07 01:20:51,881] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-07 01:20:52,373] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-07 01:20:52,675] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-07 01:20:52,768] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-07 01:20:52,451] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-07 01:20:52,835] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-07 01:20:53,967] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-07 01:20:54,577] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-07 01:20:54,757] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-07 01:20:54,992] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-07 01:20:55,049] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-07 01:20:55,059] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-07 01:20:55,219] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-07 01:20:55,237] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-07 01:20:55,418] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-07 01:20:55,320] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-07 01:20:55,862] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-07 01:20:55,892] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-07 01:20:56,037] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-07 01:20:56,077] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-07 01:20:57,108] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-07 01:20:57,157] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-07 01:20:57,285] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-07 01:20:57,857] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-07 01:20:58,534] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-07 01:20:58,787] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-07 01:20:59,474] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-07 01:20:59,553] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-07 01:20:59,555] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-07 01:20:59,579] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-07 01:20:59,667] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-07 01:20:59,866] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-07 01:20:59,874] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-07 01:20:59,917] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-07 01:21:00,524] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-07 01:21:00,532] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-07 01:21:00,648] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-07 01:21:00,801] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-07 01:21:00,871] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-07 01:21:01,274] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-07 01:21:01,410] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-07 01:21:02,241] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-07 01:21:02,328] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17200/zero_pp_rank_0_mp_rank_41_optim_states.pt successfully saved checkpoint at iteration 17200 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 49353.28 iteration 17201/ 292968 | consumed samples: 35227648 | consumed tokens: 18812256256 | elapsed time per iteration (ms): 207870.6 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.839277E+00 | loss scale: 32768.0 | grad norm: 28609.613 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 64.16 | iteration 17202/ 292968 | consumed samples: 35229696 | consumed tokens: 18814320640 | elapsed time per iteration (ms): 158176.1 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.844880E+00 | loss scale: 32768.0 | grad norm: 46802.718 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.32 | iteration 17203/ 292968 | consumed samples: 35231744 | consumed tokens: 18816385024 | elapsed time per iteration (ms): 158844.3 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.865343E+00 | loss scale: 32768.0 | grad norm: 34015.778 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.96 | iteration 17204/ 292968 | consumed samples: 35233792 | consumed tokens: 18818449408 | elapsed time per iteration (ms): 157928.8 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.834736E+00 | loss scale: 32768.0 | grad norm: 111933.666 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.45 | iteration 17205/ 292968 | consumed samples: 35235840 | consumed tokens: 18820513792 | elapsed time per iteration (ms): 158473.7 | learning rate: 5.931E-05 | global batch size: 2048 | lm loss: 2.920343E+00 | loss scale: 32768.0 | grad norm: 74636.515 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.16 | iteration 17206/ 292968 | consumed samples: 35237888 | consumed tokens: 18822578176 | elapsed time per iteration (ms): 158199.5 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.911461E+00 | loss scale: 32768.0 | grad norm: 87283.035 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.31 | iteration 17207/ 292968 | consumed samples: 35239936 | consumed tokens: 18824642560 | elapsed time per iteration (ms): 158699.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.874395E+00 | loss scale: 32768.0 | grad norm: 44484.020 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.04 | iteration 17208/ 292968 | consumed samples: 35241984 | consumed tokens: 18826706944 | elapsed time per iteration (ms): 158638.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.868932E+00 | loss scale: 32768.0 | grad norm: 45090.539 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.07 | iteration 17209/ 292968 | consumed samples: 35244032 | consumed tokens: 18828771328 | elapsed time per iteration (ms): 158530.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.836929E+00 | loss scale: 32768.0 | grad norm: 38076.904 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.13 | iteration 17210/ 292968 | consumed samples: 35246080 | consumed tokens: 18830835712 | elapsed time per iteration (ms): 159146.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.853310E+00 | loss scale: 32768.0 | grad norm: 37817.046 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.81 | iteration 17211/ 292968 | consumed samples: 35248128 | consumed tokens: 18832900096 | elapsed time per iteration (ms): 158584.5 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.860213E+00 | loss scale: 32768.0 | grad norm: 32030.228 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.10 | iteration 17212/ 292968 | consumed samples: 35250176 | consumed tokens: 18834964480 | elapsed time per iteration (ms): 158511.3 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.844354E+00 | loss scale: 32768.0 | grad norm: 39375.498 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.14 | iteration 17213/ 292968 | consumed samples: 35252224 | consumed tokens: 18837028864 | elapsed time per iteration (ms): 158500.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.855498E+00 | loss scale: 32768.0 | grad norm: 33510.230 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.15 | iteration 17214/ 292968 | consumed samples: 35254272 | consumed tokens: 18839093248 | elapsed time per iteration (ms): 158600.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.837303E+00 | loss scale: 32768.0 | grad norm: 31230.728 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.09 | iteration 17215/ 292968 | consumed samples: 35256320 | consumed tokens: 18841157632 | elapsed time per iteration (ms): 158314.5 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.841766E+00 | loss scale: 32768.0 | grad norm: 35974.943 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.25 | iteration 17216/ 292968 | consumed samples: 35258368 | consumed tokens: 18843222016 | elapsed time per iteration (ms): 158567.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.849174E+00 | loss scale: 32768.0 | grad norm: 35725.193 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.11 | iteration 17217/ 292968 | consumed samples: 35260416 | consumed tokens: 18845286400 | elapsed time per iteration (ms): 158843.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.851026E+00 | loss scale: 32768.0 | grad norm: 53178.386 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.97 | iteration 17218/ 292968 | consumed samples: 35262464 | consumed tokens: 18847350784 | elapsed time per iteration (ms): 158933.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.851266E+00 | loss scale: 32768.0 | grad norm: 33592.735 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.92 | iteration 17219/ 292968 | consumed samples: 35264512 | consumed tokens: 18849415168 | elapsed time per iteration (ms): 158930.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.862234E+00 | loss scale: 32768.0 | grad norm: 38274.229 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.92 | iteration 17220/ 292968 | consumed samples: 35266560 | consumed tokens: 18851479552 | elapsed time per iteration (ms): 159292.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.857918E+00 | loss scale: 32768.0 | grad norm: 45614.968 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.73 | iteration 17221/ 292968 | consumed samples: 35268608 | consumed tokens: 18853543936 | elapsed time per iteration (ms): 158254.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.856626E+00 | loss scale: 32768.0 | grad norm: 38839.475 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.28 | iteration 17222/ 292968 | consumed samples: 35270656 | consumed tokens: 18855608320 | elapsed time per iteration (ms): 159073.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.851610E+00 | loss scale: 32768.0 | grad norm: 37143.447 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.84 | iteration 17223/ 292968 | consumed samples: 35272704 | consumed tokens: 18857672704 | elapsed time per iteration (ms): 158416.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.879056E+00 | loss scale: 32768.0 | grad norm: 43692.897 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.19 | iteration 17224/ 292968 | consumed samples: 35274752 | consumed tokens: 18859737088 | elapsed time per iteration (ms): 158702.1 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.888955E+00 | loss scale: 32768.0 | grad norm: 49698.359 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.04 | iteration 17225/ 292968 | consumed samples: 35276800 | consumed tokens: 18861801472 | elapsed time per iteration (ms): 158372.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.844027E+00 | loss scale: 32768.0 | grad norm: 47680.401 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.21 | iteration 17226/ 292968 | consumed samples: 35278848 | consumed tokens: 18863865856 | elapsed time per iteration (ms): 160275.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.838787E+00 | loss scale: 32768.0 | grad norm: 32188.540 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.22 | iteration 17227/ 292968 | consumed samples: 35280896 | consumed tokens: 18865930240 | elapsed time per iteration (ms): 158912.5 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.840530E+00 | loss scale: 32768.0 | grad norm: 61814.832 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.93 | iteration 17228/ 292968 | consumed samples: 35282944 | consumed tokens: 18867994624 | elapsed time per iteration (ms): 159620.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.874247E+00 | loss scale: 32768.0 | grad norm: 29206.570 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.56 | iteration 17229/ 292968 | consumed samples: 35284992 | consumed tokens: 18870059008 | elapsed time per iteration (ms): 158507.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.881329E+00 | loss scale: 32768.0 | grad norm: 60181.908 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.14 | iteration 17230/ 292968 | consumed samples: 35287040 | consumed tokens: 18872123392 | elapsed time per iteration (ms): 158899.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.875564E+00 | loss scale: 32768.0 | grad norm: 35441.361 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.94 | iteration 17231/ 292968 | consumed samples: 35289088 | consumed tokens: 18874187776 | elapsed time per iteration (ms): 158599.5 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.853578E+00 | loss scale: 32768.0 | grad norm: 45842.194 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.09 | iteration 17232/ 292968 | consumed samples: 35291136 | consumed tokens: 18876252160 | elapsed time per iteration (ms): 158533.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.869680E+00 | loss scale: 32768.0 | grad norm: 54134.974 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.13 | iteration 17233/ 292968 | consumed samples: 35293184 | consumed tokens: 18878316544 | elapsed time per iteration (ms): 158571.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.855670E+00 | loss scale: 32768.0 | grad norm: 43094.364 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.11 | iteration 17234/ 292968 | consumed samples: 35295232 | consumed tokens: 18880380928 | elapsed time per iteration (ms): 159294.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.878282E+00 | loss scale: 32768.0 | grad norm: 48706.692 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.73 | iteration 17235/ 292968 | consumed samples: 35297280 | consumed tokens: 18882445312 | elapsed time per iteration (ms): 158661.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.857208E+00 | loss scale: 32768.0 | grad norm: 40299.699 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.06 | iteration 17236/ 292968 | consumed samples: 35299328 | consumed tokens: 18884509696 | elapsed time per iteration (ms): 158341.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.855665E+00 | loss scale: 32768.0 | grad norm: 37265.470 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.23 | iteration 17237/ 292968 | consumed samples: 35301376 | consumed tokens: 18886574080 | elapsed time per iteration (ms): 158366.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.859130E+00 | loss scale: 32768.0 | grad norm: 45773.426 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.22 | iteration 17238/ 292968 | consumed samples: 35303424 | consumed tokens: 18888638464 | elapsed time per iteration (ms): 158508.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.864195E+00 | loss scale: 32768.0 | grad norm: 41041.319 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.14 | iteration 17239/ 292968 | consumed samples: 35305472 | consumed tokens: 18890702848 | elapsed time per iteration (ms): 159416.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.878111E+00 | loss scale: 32768.0 | grad norm: 44468.120 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.66 | iteration 17240/ 292968 | consumed samples: 35307520 | consumed tokens: 18892767232 | elapsed time per iteration (ms): 159101.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.869275E+00 | loss scale: 32768.0 | grad norm: 45476.823 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.83 | iteration 17241/ 292968 | consumed samples: 35309568 | consumed tokens: 18894831616 | elapsed time per iteration (ms): 159155.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.877771E+00 | loss scale: 32768.0 | grad norm: 37867.967 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.80 | iteration 17242/ 292968 | consumed samples: 35311616 | consumed tokens: 18896896000 | elapsed time per iteration (ms): 159017.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.851010E+00 | loss scale: 32768.0 | grad norm: 42217.127 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.87 | iteration 17243/ 292968 | consumed samples: 35313664 | consumed tokens: 18898960384 | elapsed time per iteration (ms): 158506.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.880475E+00 | loss scale: 32768.0 | grad norm: 45571.892 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.14 | iteration 17244/ 292968 | consumed samples: 35315712 | consumed tokens: 18901024768 | elapsed time per iteration (ms): 158845.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.863154E+00 | loss scale: 32768.0 | grad norm: 37936.552 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.96 | iteration 17245/ 292968 | consumed samples: 35317760 | consumed tokens: 18903089152 | elapsed time per iteration (ms): 158499.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.839976E+00 | loss scale: 32768.0 | grad norm: 37121.763 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.15 | iteration 17246/ 292968 | consumed samples: 35319808 | consumed tokens: 18905153536 | elapsed time per iteration (ms): 158656.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.901359E+00 | loss scale: 32768.0 | grad norm: 38628.289 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.06 | iteration 17247/ 292968 | consumed samples: 35321856 | consumed tokens: 18907217920 | elapsed time per iteration (ms): 158163.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.892317E+00 | loss scale: 32768.0 | grad norm: 43081.999 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.33 | iteration 17248/ 292968 | consumed samples: 35323904 | consumed tokens: 18909282304 | elapsed time per iteration (ms): 158428.1 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.928438E+00 | loss scale: 32768.0 | grad norm: 35699.483 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.19 | iteration 17249/ 292968 | consumed samples: 35325952 | consumed tokens: 18911346688 | elapsed time per iteration (ms): 158918.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.877695E+00 | loss scale: 32768.0 | grad norm: 35957.377 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.93 | iteration 17250/ 292968 | consumed samples: 35328000 | consumed tokens: 18913411072 | elapsed time per iteration (ms): 158308.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.885111E+00 | loss scale: 32768.0 | grad norm: 49076.123 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.25 | ------------------------------------------------------------------------------------------- valid loss at iteration 17250 | lm loss value: 3.256584E+00 | lm loss PPL: 2.596070E+01 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 17250 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-07 03:41:03,004] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/mp_rank_01_model_states.pt [2022-02-07 03:41:03,016] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/mp_rank_00_model_states.pt [2022-02-07 03:41:24,062] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-07 03:41:25,151] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-07 03:41:25,815] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-07 03:41:26,466] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-07 03:41:26,504] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-07 03:41:26,724] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-07 03:41:26,811] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-07 03:41:26,822] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-07 03:41:26,892] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-07 03:41:27,157] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-07 03:41:27,394] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-07 03:41:27,844] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-07 03:41:28,004] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-07 03:41:28,010] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-07 03:41:28,086] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-07 03:41:28,080] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-07 03:41:28,461] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-07 03:41:28,615] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-07 03:41:28,616] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-07 03:41:28,978] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-07 03:41:29,249] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-07 03:41:29,681] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-07 03:41:29,730] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-07 03:41:29,722] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-07 03:41:29,745] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-07 03:41:29,894] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-07 03:41:29,791] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-07 03:41:29,826] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-07 03:41:29,858] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-07 03:41:30,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-07 03:41:30,312] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_125_optim_states.pt [2022-02-07 03:41:30,348] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-07 03:41:30,650] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-07 03:41:30,871] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-07 03:41:30,889] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-07 03:41:31,250] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-07 03:41:31,246] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-07 03:41:31,355] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-07 03:41:31,275] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-07 03:41:31,365] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-07 03:41:31,265] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-07 03:41:31,363] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-07 03:41:31,379] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-07 03:41:31,476] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-07 03:41:31,519] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-07 03:41:31,632] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-07 03:41:31,695] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-07 03:41:31,891] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-07 03:41:32,017] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-07 03:41:32,376] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-07 03:41:32,865] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-07 03:41:32,929] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-07 03:41:32,950] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-07 03:41:32,982] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-07 03:41:33,037] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-07 03:41:33,057] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-07 03:41:32,994] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-07 03:41:33,107] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-07 03:41:33,138] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-07 03:41:33,411] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-07 03:41:33,538] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-07 03:41:33,574] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-07 03:41:33,564] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-07 03:41:33,875] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-07 03:41:33,875] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-07 03:41:33,893] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-07 03:41:33,942] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-07 03:41:34,062] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-07 03:41:34,118] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-07 03:41:34,205] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-07 03:41:34,232] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-07 03:41:34,365] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-07 03:41:34,410] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-07 03:41:34,419] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-07 03:41:34,459] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-07 03:41:34,510] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-07 03:41:34,645] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-07 03:41:34,688] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-07 03:41:34,743] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-07 03:41:34,762] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-07 03:41:34,814] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-07 03:41:34,780] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-07 03:41:35,214] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-07 03:41:35,293] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-07 03:41:35,354] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-07 03:41:35,369] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-07 03:41:35,422] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-07 03:41:35,942] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-07 03:41:36,384] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-07 03:41:36,544] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-07 03:41:36,875] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-07 03:41:36,966] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-07 03:41:37,137] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-07 03:41:37,305] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-07 03:41:37,323] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-07 03:41:37,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-07 03:41:37,708] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-07 03:41:37,723] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-07 03:41:37,766] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-07 03:41:38,052] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-07 03:41:38,085] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-07 03:41:37,954] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-07 03:41:38,014] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-07 03:41:37,992] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-07 03:41:38,472] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-07 03:41:38,689] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-07 03:41:38,735] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-07 03:41:39,207] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-07 03:41:39,272] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-07 03:41:39,829] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-07 03:41:39,841] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-07 03:41:39,848] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-07 03:41:39,909] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-07 03:41:39,978] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-07 03:41:40,046] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-07 03:41:40,877] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-07 03:41:40,893] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-07 03:41:40,895] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-07 03:41:41,665] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-07 03:41:41,769] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-07 03:41:42,366] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-07 03:41:42,389] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-07 03:41:42,400] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-07 03:41:42,428] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-07 03:41:47,593] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-07 03:41:47,699] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-07 03:41:52,609] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-07 03:41:52,645] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17250/zero_pp_rank_0_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 17250 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 56319.44 iteration 17251/ 292968 | consumed samples: 35330048 | consumed tokens: 18915475456 | elapsed time per iteration (ms): 682378.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.882127E+00 | loss scale: 32768.0 | grad norm: 34621.479 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.003 | TFLOPs: 19.55 | iteration 17252/ 292968 | consumed samples: 35332096 | consumed tokens: 18917539840 | elapsed time per iteration (ms): 164082.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.882247E+00 | loss scale: 32768.0 | grad norm: 34363.740 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.28 | iteration 17253/ 292968 | consumed samples: 35334144 | consumed tokens: 18919604224 | elapsed time per iteration (ms): 159316.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.860654E+00 | loss scale: 32768.0 | grad norm: 52932.542 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.72 | iteration 17254/ 292968 | consumed samples: 35336192 | consumed tokens: 18921668608 | elapsed time per iteration (ms): 159508.7 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.871634E+00 | loss scale: 32768.0 | grad norm: 29596.528 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.62 | iteration 17255/ 292968 | consumed samples: 35338240 | consumed tokens: 18923732992 | elapsed time per iteration (ms): 159580.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.892708E+00 | loss scale: 32768.0 | grad norm: 67944.707 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.58 | iteration 17256/ 292968 | consumed samples: 35340288 | consumed tokens: 18925797376 | elapsed time per iteration (ms): 161052.7 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.879692E+00 | loss scale: 32768.0 | grad norm: 39404.264 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.81 | iteration 17257/ 292968 | consumed samples: 35342336 | consumed tokens: 18927861760 | elapsed time per iteration (ms): 159446.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.873112E+00 | loss scale: 32768.0 | grad norm: 44900.749 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.65 | iteration 17258/ 292968 | consumed samples: 35344384 | consumed tokens: 18929926144 | elapsed time per iteration (ms): 158878.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.862144E+00 | loss scale: 32768.0 | grad norm: 26391.772 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.95 | iteration 17259/ 292968 | consumed samples: 35346432 | consumed tokens: 18931990528 | elapsed time per iteration (ms): 158723.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.882540E+00 | loss scale: 32768.0 | grad norm: 28907.472 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.03 | iteration 17260/ 292968 | consumed samples: 35348480 | consumed tokens: 18934054912 | elapsed time per iteration (ms): 158759.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.908772E+00 | loss scale: 32768.0 | grad norm: 33636.079 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.01 | iteration 17261/ 292968 | consumed samples: 35350528 | consumed tokens: 18936119296 | elapsed time per iteration (ms): 158497.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.896245E+00 | loss scale: 32768.0 | grad norm: 41840.954 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.15 | iteration 17262/ 292968 | consumed samples: 35352576 | consumed tokens: 18938183680 | elapsed time per iteration (ms): 158619.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.858881E+00 | loss scale: 32768.0 | grad norm: 34119.170 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.08 | iteration 17263/ 292968 | consumed samples: 35354624 | consumed tokens: 18940248064 | elapsed time per iteration (ms): 158807.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.855645E+00 | loss scale: 32768.0 | grad norm: 41361.703 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.98 | iteration 17264/ 292968 | consumed samples: 35356672 | consumed tokens: 18942312448 | elapsed time per iteration (ms): 158649.1 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.881852E+00 | loss scale: 32768.0 | grad norm: 37177.721 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.07 | iteration 17265/ 292968 | consumed samples: 35358720 | consumed tokens: 18944376832 | elapsed time per iteration (ms): 159630.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.866328E+00 | loss scale: 32768.0 | grad norm: 40253.642 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.55 | iteration 17266/ 292968 | consumed samples: 35360768 | consumed tokens: 18946441216 | elapsed time per iteration (ms): 159777.1 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.876544E+00 | loss scale: 32768.0 | grad norm: 35825.625 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.47 | iteration 17267/ 292968 | consumed samples: 35362816 | consumed tokens: 18948505600 | elapsed time per iteration (ms): 159182.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.901956E+00 | loss scale: 32768.0 | grad norm: 41310.930 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.79 | iteration 17268/ 292968 | consumed samples: 35364864 | consumed tokens: 18950569984 | elapsed time per iteration (ms): 159043.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.873979E+00 | loss scale: 32768.0 | grad norm: 83359.557 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.86 | iteration 17269/ 292968 | consumed samples: 35366912 | consumed tokens: 18952634368 | elapsed time per iteration (ms): 159416.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.911788E+00 | loss scale: 32768.0 | grad norm: 48582.749 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.66 | iteration 17270/ 292968 | consumed samples: 35368960 | consumed tokens: 18954698752 | elapsed time per iteration (ms): 158699.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.933742E+00 | loss scale: 32768.0 | grad norm: 123728.188 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.04 | iteration 17271/ 292968 | consumed samples: 35371008 | consumed tokens: 18956763136 | elapsed time per iteration (ms): 159122.9 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.920496E+00 | loss scale: 32768.0 | grad norm: 103518.144 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.82 | iteration 17272/ 292968 | consumed samples: 35373056 | consumed tokens: 18958827520 | elapsed time per iteration (ms): 162257.1 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.882499E+00 | loss scale: 32768.0 | grad norm: 49058.283 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.20 | iteration 17273/ 292968 | consumed samples: 35375104 | consumed tokens: 18960891904 | elapsed time per iteration (ms): 158205.3 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.863504E+00 | loss scale: 32768.0 | grad norm: 65752.512 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.30 | iteration 17274/ 292968 | consumed samples: 35377152 | consumed tokens: 18962956288 | elapsed time per iteration (ms): 158336.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.837116E+00 | loss scale: 32768.0 | grad norm: 37988.947 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 84.23 | iteration 17275/ 292968 | consumed samples: 35379200 | consumed tokens: 18965037056 | elapsed time per iteration (ms): 160437.9 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.876899E+00 | loss scale: 32768.0 | grad norm: 51887.909 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.79 | iteration 17276/ 292968 | consumed samples: 35381248 | consumed tokens: 18967117824 | elapsed time per iteration (ms): 160924.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.860259E+00 | loss scale: 32768.0 | grad norm: 37218.300 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.54 | iteration 17277/ 292968 | consumed samples: 35383296 | consumed tokens: 18969198592 | elapsed time per iteration (ms): 160450.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.872345E+00 | loss scale: 32768.0 | grad norm: 79672.443 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.78 | iteration 17278/ 292968 | consumed samples: 35385344 | consumed tokens: 18971279360 | elapsed time per iteration (ms): 160679.2 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.900504E+00 | loss scale: 32768.0 | grad norm: 53402.100 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.66 | iteration 17279/ 292968 | consumed samples: 35387392 | consumed tokens: 18973360128 | elapsed time per iteration (ms): 160637.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.875491E+00 | loss scale: 32768.0 | grad norm: 81349.658 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.69 | iteration 17280/ 292968 | consumed samples: 35389440 | consumed tokens: 18975440896 | elapsed time per iteration (ms): 160195.1 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.864275E+00 | loss scale: 32768.0 | grad norm: 62268.046 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 83.92 | slurmstepd: error: *** STEP 1790286.0 ON jean-zay-iam01 CANCELLED AT 2022-02-07T05:02:14 *** WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586893 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586894 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606323 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586895 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606324 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581535 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606325 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586896 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585261 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581536 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585262 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584362 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581411 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586865 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606326 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586897 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581537 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584363 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583796 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584110 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586866 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581412 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585742 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606327 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582453 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584111 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581538 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583797 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585263 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582765 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582454 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585743 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672185 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582766 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586898 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583818 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586867 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584364 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672186 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581413 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581539 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584112 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583819 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584365 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585264 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581414 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582455 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583798 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582767 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581540 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585744 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606328 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588133 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583799 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586868 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586899 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672187 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582768 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583820 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606329 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584113 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586900 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585265 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588134 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582456 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581415 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585745 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585266 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586869 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581416 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581541 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582769 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584114 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672188 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584366 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583821 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588135 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586870 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582457 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 606330 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672189 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583822 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583800 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585746 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581542 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584367 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581417 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583823 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672190 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582458 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582770 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583801 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584115 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 581418 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584368 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588136 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585747 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584369 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582459 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585267 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672191 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586871 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582771 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583824 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588137 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582772 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585748 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584116 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 582460 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585268 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583802 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 672192 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583825 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 584117 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 586872 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 583803 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588138 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588139 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 585749 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 588140 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865864 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865865 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865866 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865867 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865868 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865869 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865870 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1865871 closing signal SIGTERM srun: Job step aborted: Waiting up to 62 seconds for job step to finish. /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1793937.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] > setting tensorboard ...  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.10.2 torch cuda version ............... 11.3 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+ba9c4cc7, ba9c4cc7, master deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-07 05:30:39,466] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.127 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 7.613 seconds time to initialize megatron (seconds): -8.793 [after megatron is initialized] datetime: 2022-02-07 05:30:47 building GPT model ... [2022-02-07 05:30:47,243] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-07 05:30:47,244] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-07 05:30:47,244] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.39 GB, percent = 9.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-07 05:30:48,956] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-07 05:30:49,571] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-07 05:30:49,572] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-07 05:30:49,572] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.8 GB, percent = 9.7% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-07 05:30:49,662] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+ba9c4cc7, git-hash=ba9c4cc7, git-branch=master [2022-02-07 05:30:50,374] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-07 05:30:50,374] [INFO] [engine.py:1099:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-07 05:30:50,374] [INFO] [engine.py:1105:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-07 05:30:50,374] [INFO] [engine.py:1121:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-07 05:30:50,374] [INFO] [utils.py:48:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-07 05:30:50,374] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-07 05:30:50,374] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-07 05:30:50,374] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-07 05:30:50,374] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-07 05:30:50,374] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] [2022-02-07 05:30:56,612] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-07 05:30:56,612] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-07 05:30:56,612] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.86 GB, percent = 9.7% [2022-02-07 05:30:56,677] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-07 05:30:56,678] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-07 05:30:56,678] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.86 GB, percent = 9.7% [2022-02-07 05:30:56,678] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-07 05:30:56,706] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-07 05:30:56,706] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-07 05:30:56,707] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 48.86 GB, percent = 9.7% [2022-02-07 05:30:56,707] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-07 05:30:56,707] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-07 05:30:56,707] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-07 05:30:56,707] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-07 05:30:56,707] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-07 05:30:56,707] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-07 05:30:56,707] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-07 05:30:56,707] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-07 05:30:56,707] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-07 05:30:56,707] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-07 05:30:56,708] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_16bit_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-07 05:30:56,709] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-07 05:30:56,709] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-07 05:30:56,709] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-07 05:30:59,062] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,062] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:30:59,063] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-07 05:31:20,460] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-07 05:31:21,971] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-07 05:31:22,263] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-07 05:31:22,546] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-07 05:31:22,865] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-07 05:31:23,292] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-07 05:31:23,483] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-07 05:31:23,650] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-07 05:31:23,718] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-07 05:31:23,955] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-07 05:31:24,318] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-07 05:31:24,335] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-07 05:31:24,710] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-07 05:31:24,750] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-07 05:31:24,888] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-07 05:31:25,008] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-07 05:31:25,020] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-07 05:31:25,097] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-07 05:31:25,297] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-07 05:31:25,326] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-07 05:31:25,580] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-07 05:31:25,736] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-07 05:31:25,858] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-07 05:31:25,870] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-07 05:31:26,026] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-07 05:31:26,028] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-07 05:31:26,090] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-07 05:31:26,106] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-07 05:31:26,488] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-07 05:31:26,598] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-07 05:31:26,614] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-07 05:31:26,697] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-07 05:31:26,698] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-07 05:31:26,773] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-07 05:31:26,781] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-07 05:31:26,819] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-07 05:31:26,887] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-07 05:31:26,929] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-07 05:31:27,171] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-07 05:31:27,186] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-07 05:31:27,203] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-07 05:31:27,207] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-07 05:31:27,225] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-07 05:31:27,311] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-07 05:31:27,419] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-07 05:31:27,502] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-07 05:31:27,526] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-07 05:31:27,629] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-07 05:31:27,671] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-07 05:31:27,987] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-07 05:31:28,003] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-07 05:31:28,162] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-07 05:31:28,181] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-07 05:31:28,237] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-07 05:31:28,309] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-07 05:31:28,347] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-07 05:31:28,375] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-07 05:31:28,383] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-07 05:31:28,422] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-07 05:31:28,544] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-07 05:31:28,584] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-07 05:31:28,686] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-07 05:31:28,708] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-07 05:31:28,713] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-07 05:31:28,716] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-07 05:31:28,771] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-07 05:31:28,823] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-07 05:31:29,000] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-07 05:31:29,226] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-07 05:31:29,616] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-07 05:31:29,670] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-07 05:31:29,800] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-07 05:31:29,840] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-07 05:31:29,957] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-07 05:31:30,008] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-07 05:31:30,258] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-07 05:31:30,329] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-07 05:31:30,354] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-07 05:31:30,367] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-07 05:31:30,382] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-07 05:31:30,474] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-07 05:31:30,550] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-07 05:31:30,739] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-07 05:31:31,045] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-07 05:31:31,077] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-07 05:31:31,084] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-07 05:31:31,120] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-07 05:31:31,156] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-07 05:31:31,215] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-07 05:31:31,311] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-07 05:31:31,317] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-07 05:31:31,418] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-07 05:31:31,433] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-07 05:31:31,436] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-07 05:31:31,530] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-07 05:31:31,610] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-07 05:31:31,645] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-07 05:31:31,650] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-07 05:31:31,662] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-07 05:31:31,698] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-07 05:31:31,721] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-07 05:31:31,780] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-07 05:31:31,798] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-07 05:31:31,805] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-07 05:31:31,884] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-07 05:31:31,895] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-07 05:31:31,925] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-07 05:31:31,943] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-07 05:31:31,956] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-07 05:31:32,003] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-07 05:31:32,042] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-07 05:31:32,061] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-07 05:31:32,095] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-07 05:31:32,109] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-07 05:31:32,178] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-07 05:31:32,207] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-07 05:31:32,247] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-07 05:31:32,275] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-07 05:31:32,287] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-07 05:31:32,396] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-07 05:31:32,450] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-07 05:31:32,491] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-07 05:31:32,562] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-07 05:31:32,615] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-07 05:31:32,655] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-07 05:31:32,675] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-07 05:31:32,677] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-07 05:31:32,689] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-07 05:31:32,710] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-07 05:31:32,721] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-07 05:31:32,749] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-07 05:31:32,776] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-07 05:31:32,783] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-07 05:31:32,796] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-07 05:31:32,805] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-07 05:31:32,836] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-07 05:31:32,846] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-07 05:31:32,923] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-07 05:31:32,948] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-07 05:31:32,957] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-07 05:31:33,077] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-07 05:31:33,086] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-07 05:31:33,113] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-07 05:31:33,143] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-07 05:31:33,146] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-07 05:31:33,152] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-07 05:31:33,242] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-07 05:31:33,285] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-07 05:31:33,305] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-02-07 05:31:33,309] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-07 05:31:33,316] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-07 05:31:33,347] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-07 05:31:33,389] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-07 05:31:33,459] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-07 05:31:33,550] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-07 05:31:33,569] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-07 05:31:33,586] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-07 05:31:33,601] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-07 05:31:33,605] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-07 05:31:33,630] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-07 05:31:33,636] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-07 05:31:33,642] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-07 05:31:33,699] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-02-07 05:31:33,699] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-07 05:31:33,729] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-07 05:31:33,732] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-07 05:31:33,783] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 [2022-02-07 05:31:33,785] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-07 05:31:33,794] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 checkpoint version 3.0 [2022-02-07 05:31:33,806] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-07 05:31:33,829] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-07 05:31:33,880] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-07 05:31:33,926] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-07 05:31:33,937] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-07 05:31:33,954] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-07 05:31:33,978] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-07 05:31:34,050] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-07 05:31:34,051] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-07 05:31:34,063] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-07 05:31:34,065] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-07 05:31:34,141] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-07 05:31:34,178] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-07 05:31:34,185] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-07 05:31:34,223] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-07 05:31:34,254] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-07 05:31:34,281] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-07 05:31:34,289] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-07 05:31:34,324] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-07 05:31:34,327] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-07 05:31:34,357] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-07 05:31:34,433] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-07 05:31:34,441] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-07 05:31:34,529] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-07 05:31:34,530] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-07 05:31:34,530] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-07 05:31:34,557] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-07 05:31:34,562] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-07 05:31:34,680] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-07 05:31:34,692] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-07 05:31:34,699] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-07 05:31:34,719] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-07 05:31:34,729] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-07 05:31:34,768] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-07 05:31:34,857] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-07 05:31:34,985] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-07 05:31:35,011] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-07 05:31:35,022] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-07 05:31:35,035] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-07 05:31:35,067] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-07 05:31:35,078] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-07 05:31:35,087] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-07 05:31:35,180] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-07 05:31:35,186] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-07 05:31:35,199] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-07 05:31:35,210] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-07 05:31:35,235] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-07 05:31:35,280] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-07 05:31:35,296] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-07 05:31:35,303] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-07 05:31:35,315] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-07 05:31:35,324] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-07 05:31:35,363] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-07 05:31:35,405] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-07 05:31:35,410] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-07 05:31:35,418] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-07 05:31:35,450] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-07 05:31:35,457] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-02-07 05:31:35,480] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-07 05:31:35,518] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-07 05:31:35,540] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-07 05:31:35,551] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-07 05:31:35,561] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-07 05:31:35,691] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-07 05:31:35,721] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-07 05:31:35,936] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-07 05:31:35,946] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-07 05:31:36,045] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-07 05:31:36,114] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-07 05:31:36,114] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-07 05:31:36,249] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-07 05:31:36,342] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-07 05:31:36,371] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-07 05:31:36,469] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-07 05:31:36,495] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-07 05:31:36,497] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-07 05:31:36,624] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-07 05:31:36,681] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-07 05:31:36,690] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-07 05:31:36,706] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-07 05:31:36,712] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-02-07 05:31:36,732] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-07 05:31:36,767] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-07 05:31:36,768] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-07 05:31:37,006] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-07 05:31:37,108] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 [2022-02-07 05:31:37,228] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 17250 time (ms) | load-checkpoint: 36874.13 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-07 05:31:37 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.085908 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.218 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.545 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.066 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-07 05:31:45 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 50033.07 | train/valid/test-data-iterators-setup: 7403.90 [001-010] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B [002-002] 103.3651B / 103.3651B [001-009] 103.3651B / 103.3651B[002-009] 103.3651B / 103.3651B [001-004] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B[002-005] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B[001-015] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B[001-026] 103.3651B / 103.3651B[003-027] 103.3651B / 103.3651B [003-006] 103.3651B / 103.3651B[002-007] 103.3651B / 103.3651B [001-029] 103.3651B / 103.3651B[003-029] 103.3651B / 103.3651B[002-029] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [003-021] 103.3651B / 103.3651B [003-019] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B[001-022] 103.3651B / 103.3651B [002-023] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [003-023] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B[002-001] 103.3651B / 103.3651B [001-017] 103.3651B / 103.3651B[003-016] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B [003-030] 103.3651B / 103.3651B [003-002] 103.3651B / 103.3651B[001-002] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [002-014] 103.3651B / 103.3651B[002-015] 103.3651B / 103.3651B [001-014] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [002-012] 103.3651B / 103.3651B[003-012] 103.3651B / 103.3651B[001-013] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B[002-021] 103.3651B / 103.3651B [003-018] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B[003-024] 103.3651B / 103.3651B[002-024] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B [002-022] 103.3651B / 103.3651B [003-000] 125.2243B / 103.3681B [002-016] 103.3651B / 103.3651B [003-017] 103.3651B / 103.3651B [003-011] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B[003-010] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B[001-003] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B [002-008] 103.3651B / 103.3651B [001-005] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B [001-007] 103.3651B / 103.3651B [003-013] 103.3651B / 103.3651B[002-013] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B [003-028] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B[002-020] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [001-018] 103.3651B / 103.3651B [001-024] 103.3651B / 103.3651B[002-025] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [003-001] 103.3651B / 103.3651B [001-001] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B[003-031] 125.2273B / 103.3710B [003-003] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [003-026] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [001-000] 125.2243B / 103.3681B [000-017] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [002-031] 125.2273B / 103.3710B [000-003] 103.3651B / 103.3651B [000-008] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-007] 103.3651B / 103.3651B [000-012] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [000-002] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-021] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B [000-004] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [before the start of training step] datetime: 2022-02-07 05:31:45 [2022-02-07 05:31:45,602] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-07 05:31:45,602] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-07 05:31:45,602] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-07 05:31:45,602] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-07 05:31:45,602] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 124] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 122] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 17251 iterations) memory (MB) | allocated: 13207.3203125 | max allocated: 20670.9365234375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 6] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 123] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 126] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20726.50439453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 8] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 20] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 0] (after 17251 iterations) memory (MB) | allocated: 13210.06884765625 | max allocated: 20673.68505859375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 4] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 24] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0[Rank 36] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 10] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 30] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 58] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0[Rank 62] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 66] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 86] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 125] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 94] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 iteration 17251/ 292968 | consumed samples: 35330048 | consumed tokens: 18915475456 | elapsed time per iteration (ms): 236745.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.881958E+00 | loss scale: 32768.0 | grad norm: 36423.291 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.009 | TFLOPs: 56.34 | [Rank 102] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 127] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 118] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 3] (after 17251 iterations) memory (MB) | allocated: 13208.5361328125 | max allocated: 20672.15234375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 7] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 11] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 39] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 35] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 107] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 5] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 9] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 1] (after 17251 iterations) memory (MB) | allocated: 13207.3203125 | max allocated: 20670.9365234375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 25] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16958.71533203125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 13] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0[Rank 49] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0[Rank 93] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 239, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 183, in pretrain iteration = train(forward_step_func, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 867, in train train_step(forward_step_func, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/training.py", line 432, in train_step loss = model[0].train_batch(data_iter=data_iterator) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch self._exec_schedule(sched) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 1335, in _exec_schedule self._exec_instr(**cmd.kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 733, in _exec_backward_pass torch.autograd.backward(tensors=out_tensors, grad_tensors=grad_tensors) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward Variable._execution_engine.run_backward( File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/autograd/function.py", line 199, in apply return user_fn(self, *args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 693, in backward outputs = ctx.run_function(*detached_inputs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func inputs = layer(inputs) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/model/transformer.py", line 591, in forward return super().forward(hidden_states, attention_mask, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/model/transformer.py", line 482, in forward self.self_attention(layernorm_output, File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/model/transformer.py", line 218, in forward mixed_x_layer, _ = self.query_key_value(hidden_states) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/mpu/layers.py", line 329, in forward output_parallel = F.linear(input_parallel, self.weight, bias) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/functional.py", line 1848, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)` terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: uncorrectable NVLink error detected during the execution CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1640811806235/work/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x148853f32d62 in /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: + 0x1c613 (0x148899450613 in /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x148899451022 in /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x148853f1c314 in /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4: + 0x299129 (0x1488ed458129 in /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5: + 0xada181 (0x1488edc99181 in /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x1488edc99482 in /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #7: + 0x110b7c (0x55d581ae8b7c in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #8: + 0x1105b9 (0x55d581ae85b9 in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #9: + 0x1105a3 (0x55d581ae85a3 in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #10: + 0x1105a3 (0x55d581ae85a3 in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #11: + 0x177917 (0x55d581b4f917 in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #12: PyDict_SetItemString + 0x4c (0x55d581b5286c in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #13: PyImport_Cleanup + 0xac (0x55d581bc40ec in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #14: Py_FinalizeEx + 0x79 (0x55d581c2a589 in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #15: Py_RunMain + 0x1bc (0x55d581c2d8fc in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #16: Py_BytesMain + 0x39 (0x55d581c2dce9 in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) frame #17: __libc_start_main + 0xf3 (0x14892113c6a3 in /lib64/libc.so.6) frame #18: + 0x1f7847 (0x55d581bcf847 in /gpfswork/rech/six/commun/conda/cutting-edge/bin/python) WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618016 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618017 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618018 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618019 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618020 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618022 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618023 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 5 (pid: 618021) of binary: /gpfswork/rech/six/commun/conda/cutting-edge/bin/python WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers slurmstepd: error: *** STEP 1793937.0 ON jean-zay-iam01 CANCELLED AT 2022-02-07T05:40:01 *** WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638262 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638263 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618737 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638264 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615851 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618738 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615586 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615587 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615852 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618647 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898755 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638265 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709278 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709279 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618739 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898756 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614650 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615588 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615853 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898757 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620028 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613305 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618740 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620029 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613306 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638266 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615589 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618648 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614651 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709280 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618649 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898758 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615590 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638267 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618741 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613307 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620030 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618742 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617093 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709281 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614280 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615854 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617094 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616298 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616114 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616299 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616115 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614652 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898759 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615591 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618650 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614653 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615855 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617095 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709282 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616300 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613268 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638268 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620031 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613308 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613269 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613270 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898760 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616116 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618651 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615592 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615856 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614281 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614282 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614654 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898761 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 638269 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617096 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613271 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616301 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615857 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618652 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613309 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616117 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1898762 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617097 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614283 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615858 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709283 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618653 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613310 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616118 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620032 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613311 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614284 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616302 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613312 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620033 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613272 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617098 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709284 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618654 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616119 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614285 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 709285 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616303 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 615593 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617099 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620034 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617100 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613273 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613274 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614286 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616304 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614287 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 613275 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614655 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 620035 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616120 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616121 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614656 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614657 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616305 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618743 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618744 closing signal SIGTERM /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.95 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 1 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1794034.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embed_layernorm ................................. True embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 150 eval_iters ...................................... 5 eval_only ....................................... None evidence_data_path .............................. None exit_duration_in_mins ........................... 1185 exit_interval ................................... None ffn_hidden_size ................................. 46400 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 2048 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 11600 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.006 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 145 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ False loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_decay_tokens ................................. 260000000000 lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 3750000 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 80 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 64 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 32 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... False sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 save_interval ................................... 50 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train_iteration_range ...................... None split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_debug_dir ........................... None tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs//tensorboard/cl-a100 tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 600000000 train_tokens .................................... 300000000000 train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 128 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2048 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] > setting tensorboard ...  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] torch version .................... 1.10.2 torch cuda version ............... 11.3 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+ba9c4cc7, ba9c4cc7, master deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3 **** Git info for Megatron: git_hash=f689231 git_branch=log-grad-norm **** > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 32 > setting random seeds to 43 ... [2022-02-07 05:40:21,465] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.169 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:295: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 8.013 seconds time to initialize megatron (seconds): 61.648 [after megatron is initialized] datetime: 2022-02-07 05:40:29 building GPT model ... [2022-02-07 05:40:29,688] [INFO] [utils.py:824:see_memory_usage] Before Building Model [2022-02-07 05:40:29,688] [INFO] [utils.py:825:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2022-02-07 05:40:29,689] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 45.72 GB, percent = 9.1% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} [2022-02-07 05:40:31,408] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=5 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe stage=1 layers=2 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=2 layers=2 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=3 layers=2 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=4 layers=2 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe stage=5 layers=2 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=6 layers=2 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe stage=7 layers=2 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=8 layers=2 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=9 layers=2 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=10 layers=2 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe stage=11 layers=2 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=12 layers=2 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe stage=13 layers=2 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=14 layers=2 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe stage=15 layers=2 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe stage=16 layers=2 35: ParallelTransformerLayerPipe 36: ParallelTransformerLayerPipe stage=17 layers=2 37: ParallelTransformerLayerPipe 38: ParallelTransformerLayerPipe stage=18 layers=2 39: ParallelTransformerLayerPipe 40: ParallelTransformerLayerPipe stage=19 layers=2 41: ParallelTransformerLayerPipe 42: ParallelTransformerLayerPipe stage=20 layers=2 43: ParallelTransformerLayerPipe 44: ParallelTransformerLayerPipe stage=21 layers=2 45: ParallelTransformerLayerPipe 46: ParallelTransformerLayerPipe stage=22 layers=2 47: ParallelTransformerLayerPipe 48: ParallelTransformerLayerPipe stage=23 layers=2 49: ParallelTransformerLayerPipe 50: ParallelTransformerLayerPipe stage=24 layers=2 51: ParallelTransformerLayerPipe 52: ParallelTransformerLayerPipe stage=25 layers=2 53: ParallelTransformerLayerPipe 54: ParallelTransformerLayerPipe stage=26 layers=2 55: ParallelTransformerLayerPipe 56: ParallelTransformerLayerPipe stage=27 layers=2 57: ParallelTransformerLayerPipe 58: ParallelTransformerLayerPipe stage=28 layers=2 59: ParallelTransformerLayerPipe 60: ParallelTransformerLayerPipe stage=29 layers=2 61: ParallelTransformerLayerPipe 62: ParallelTransformerLayerPipe stage=30 layers=2 63: ParallelTransformerLayerPipe 64: ParallelTransformerLayerPipe stage=31 layers=6 65: ParallelTransformerLayerPipe 66: ParallelTransformerLayerPipe 67: 68: MixedFusedLayerNorm 69: EmbeddingPipe 70: float16_to_fp32 loss: CrossEntropy [2022-02-07 05:40:32,015] [INFO] [utils.py:824:see_memory_usage] After Building Model [2022-02-07 05:40:32,016] [INFO] [utils.py:825:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB [2022-02-07 05:40:32,016] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 46.26 GB, percent = 9.2% setting training iterations to 292968 > learning rate decay style: cosine DeepSpeed is enabled. [2022-02-07 05:40:32,102] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+ba9c4cc7, git-hash=ba9c4cc7, git-branch=master [2022-02-07 05:40:32,930] [INFO] [engine.py:275:__init__] DeepSpeed Flops Profiler Enabled: False [2022-02-07 05:40:32,930] [INFO] [engine.py:1099:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2022-02-07 05:40:32,930] [INFO] [engine.py:1105:_configure_optimizer] Using client Optimizer as basic optimizer [2022-02-07 05:40:32,930] [INFO] [engine.py:1121:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2022-02-07 05:40:32,930] [INFO] [utils.py:48:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2022-02-07 05:40:32,930] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2022-02-07 05:40:32,931] [INFO] [stage_1_and_2.py:121:__init__] Reduce bucket size 500000000 [2022-02-07 05:40:32,931] [INFO] [stage_1_and_2.py:122:__init__] Allgather bucket size 500000000 [2022-02-07 05:40:32,931] [INFO] [stage_1_and_2.py:123:__init__] CPU Offload: False [2022-02-07 05:40:32,931] [INFO] [stage_1_and_2.py:124:__init__] Round robin gradient partitioning: False Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 0 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 1 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 127 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 126 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 3 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 2 partition count [1, 1] and sizes[(978123600, False), (191400, False)] Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 124 partition count [1, 1] and sizes[(978123600, False), (214600, False)] Rank: 125 partition count [1, 1] and sizes[(978123600, False), (214600, False)] [2022-02-07 05:40:37,389] [INFO] [utils.py:824:see_memory_usage] Before initializing optimizer states [2022-02-07 05:40:37,390] [INFO] [utils.py:825:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB [2022-02-07 05:40:37,390] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 46.19 GB, percent = 9.2% [2022-02-07 05:40:37,461] [INFO] [utils.py:824:see_memory_usage] After initializing optimizer states [2022-02-07 05:40:37,462] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB [2022-02-07 05:40:37,462] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 46.19 GB, percent = 9.2% [2022-02-07 05:40:37,462] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized [2022-02-07 05:40:37,485] [INFO] [utils.py:824:see_memory_usage] After initializing ZeRO optimizer [2022-02-07 05:40:37,486] [INFO] [utils.py:825:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB [2022-02-07 05:40:37,486] [INFO] [utils.py:833:see_memory_usage] CPU Virtual Memory: used = 46.19 GB, percent = 9.2% [2022-02-07 05:40:37,486] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2022-02-07 05:40:37,486] [INFO] [engine.py:805:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2022-02-07 05:40:37,486] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-02-07 05:40:37,486] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2022-02-07 05:40:37,486] [INFO] [config.py:1058:print] DeepSpeedEngine configuration: [2022-02-07 05:40:37,486] [INFO] [config.py:1062:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-02-07 05:40:37,486] [INFO] [config.py:1062:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] amp_enabled .................. False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] amp_params ................... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": null, "exps_dir": null, "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] bfloat16_enabled ............. False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] checkpoint_tag_validation_enabled True [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] checkpoint_tag_validation_fail False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] communication_data_type ...... None [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] curriculum_enabled ........... True [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] dataloader_drop_last ......... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] disable_allgather ............ False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] dump_state ................... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_enabled ........... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_gas_boundary_resolution 1 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_layer_num ......... 0 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_max_iter .......... 100 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_stability ......... 1e-06 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_tol ............... 0.01 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] eigenvalue_verbose ........... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] elasticity_enabled ........... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] fp16_enabled ................. True [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] fp16_master_weights_and_gradients False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] fp16_mixed_quantize .......... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] global_rank .................. 0 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] gradient_accumulation_steps .. 2048 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] gradient_clipping ............ 1.0 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] gradient_predivide_factor .... 1.0 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] initial_dynamic_scale ........ 4096 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] loss_scale ................... 0 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] memory_breakdown ............. False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] optimizer_legacy_fusion ...... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] optimizer_name ............... None [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] optimizer_params ............. None [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] pld_enabled .................. False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] pld_params ................... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] prescale_gradients ........... False [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] quantize_change_rate ......... 0.001 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] quantize_groups .............. 1 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] quantize_offset .............. 1000 [2022-02-07 05:40:37,487] [INFO] [config.py:1062:print] quantize_period .............. 1000 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] quantize_rounding ............ 0 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] quantize_start_bits .......... 16 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] quantize_target_bits ......... 8 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] quantize_training_enabled .... False [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] quantize_type ................ 0 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] quantize_verbose ............. False [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] scheduler_name ............... None [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] scheduler_params ............. None [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] sparse_attention ............. None [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] sparse_gradients_enabled ..... False [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] steps_per_print .............. 2000 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] tensorboard_enabled .......... False [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] tensorboard_job_name ......... DeepSpeedJobName [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] tensorboard_output_path ...... [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] train_batch_size ............. 2048 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] train_micro_batch_size_per_gpu 1 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] use_quantizer_kernel ......... False [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] wall_clock_breakdown ......... False [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] world_size ................... 1 [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] zero_allow_untested_optimizer False [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": false, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_16bit_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] zero_enabled ................. True [2022-02-07 05:40:37,488] [INFO] [config.py:1062:print] zero_optimization_stage ...... 1 [2022-02-07 05:40:37,488] [INFO] [config.py:1064:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "elastic_checkpoint": true, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "curriculum_learning": { "enabled": true, "curriculum_type": "seqlen", "min_difficulty": 64, "max_difficulty": 2.048000e+03, "schedule_type": "fixed_linear", "schedule_config": { "total_curriculum_step": 3.600000e+04, "difficulty_step": 8 } }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2022-02-07 05:40:37,488] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978338200 (978.338M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) [2022-02-07 05:40:39,834] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731388800 (104731.389M) UNIQUE_PARAMS=104048288000 (104048.288M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 3750000 for warmup iterations > using checkpoint value 600000000 for total number of iterations > using checkpoint value cosine for decay style [2022-02-07 05:40:59,682] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 56 [2022-02-07 05:41:01,061] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 56 [2022-02-07 05:41:01,140] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 36 [2022-02-07 05:41:01,377] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 32 [2022-02-07 05:41:01,557] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 88 [2022-02-07 05:41:02,037] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 64 [2022-02-07 05:41:02,605] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 36 [2022-02-07 05:41:02,836] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 35 [2022-02-07 05:41:02,915] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 32 [2022-02-07 05:41:03,083] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 88 [2022-02-07 05:41:03,155] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 91 [2022-02-07 05:41:03,398] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 90 [2022-02-07 05:41:03,434] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 64 [2022-02-07 05:41:03,863] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 92 [2022-02-07 05:41:04,205] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 94 [2022-02-07 05:41:04,264] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 35 [2022-02-07 05:41:04,350] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 39 [2022-02-07 05:41:04,625] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 93 [2022-02-07 05:41:04,653] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 89 [2022-02-07 05:41:04,676] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 102 [2022-02-07 05:41:04,776] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 91 [2022-02-07 05:41:05,045] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 90 [2022-02-07 05:41:05,200] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 38 [2022-02-07 05:41:05,246] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 95 [2022-02-07 05:41:05,531] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 92 [2022-02-07 05:41:05,640] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 34 [2022-02-07 05:41:05,737] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 30 [2022-02-07 05:41:05,742] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 39 [2022-02-07 05:41:05,904] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 97 [2022-02-07 05:41:05,910] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 94 [2022-02-07 05:41:06,018] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 53 [2022-02-07 05:41:06,051] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 101 [2022-02-07 05:41:06,136] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 102 [2022-02-07 05:41:06,231] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 89 [2022-02-07 05:41:06,318] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 93 [2022-02-07 05:41:06,390] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 37 [2022-02-07 05:41:06,568] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 33 [2022-02-07 05:41:06,623] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 96 [2022-02-07 05:41:06,672] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 60 [2022-02-07 05:41:06,826] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 95 [2022-02-07 05:41:06,830] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 38 [2022-02-07 05:41:07,004] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 61 [2022-02-07 05:41:07,084] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 34 [2022-02-07 05:41:07,122] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 30 [2022-02-07 05:41:07,202] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 121 [2022-02-07 05:41:07,262] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 28 [2022-02-07 05:41:07,360] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 97 [2022-02-07 05:41:07,398] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 100 [2022-02-07 05:41:07,456] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 53 [2022-02-07 05:41:07,461] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 101 [2022-02-07 05:41:07,738] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 84 [2022-02-07 05:41:07,743] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 99 [2022-02-07 05:41:07,805] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 44 [2022-02-07 05:41:07,816] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 42 [2022-02-07 05:41:07,875] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 37 [2022-02-07 05:41:07,917] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 40 [2022-02-07 05:41:07,939] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 43 [2022-02-07 05:41:08,105] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 96 [2022-02-07 05:41:08,122] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 60 [2022-02-07 05:41:08,175] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 33 [2022-02-07 05:41:08,223] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 120 [2022-02-07 05:41:08,571] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 61 [2022-02-07 05:41:08,606] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 11 [2022-02-07 05:41:08,651] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 28 [2022-02-07 05:41:08,731] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 121 [2022-02-07 05:41:08,757] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 76 [2022-02-07 05:41:08,827] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 100 [2022-02-07 05:41:08,835] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 55 [2022-02-07 05:41:08,944] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 58 [2022-02-07 05:41:08,993] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 5 [2022-02-07 05:41:09,126] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 41 [2022-02-07 05:41:09,196] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 99 [2022-02-07 05:41:09,311] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 44 [2022-02-07 05:41:09,355] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 48 [2022-02-07 05:41:09,373] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 84 [2022-02-07 05:41:09,388] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 4 [2022-02-07 05:41:09,436] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 46 [2022-02-07 05:41:09,454] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 70 [2022-02-07 05:41:09,510] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 52 [2022-02-07 05:41:09,524] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 40 [2022-02-07 05:41:09,547] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 42 [2022-02-07 05:41:09,666] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 43 [2022-02-07 05:41:09,739] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 103 [2022-02-07 05:41:09,749] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 47 [2022-02-07 05:41:09,770] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 120 [2022-02-07 05:41:09,806] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 108 [2022-02-07 05:41:09,811] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 63 [2022-02-07 05:41:09,880] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 72 [2022-02-07 05:41:10,032] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 11 [2022-02-07 05:41:10,189] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 76 [2022-02-07 05:41:10,340] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 55 [2022-02-07 05:41:10,341] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 58 [2022-02-07 05:41:10,415] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 8 [2022-02-07 05:41:10,434] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 107 [2022-02-07 05:41:10,542] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 106 [2022-02-07 05:41:10,633] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 5 [2022-02-07 05:41:10,763] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 48 [2022-02-07 05:41:10,817] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 62 [2022-02-07 05:41:10,909] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 65 [2022-02-07 05:41:10,936] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 4 [2022-02-07 05:41:10,980] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 52 [2022-02-07 05:41:10,990] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 49 [2022-02-07 05:41:10,998] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 45 [2022-02-07 05:41:11,013] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 41 [2022-02-07 05:41:11,021] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 70 [2022-02-07 05:41:11,063] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 26 [2022-02-07 05:41:11,107] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 103 [2022-02-07 05:41:11,125] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 46 [2022-02-07 05:41:11,132] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 104 [2022-02-07 05:41:11,234] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 63 [2022-02-07 05:41:11,267] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 108 [2022-02-07 05:41:11,298] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 66 [2022-02-07 05:41:11,302] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 57 [2022-02-07 05:41:11,320] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 72 [2022-02-07 05:41:11,345] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 47 [2022-02-07 05:41:11,371] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 59 [2022-02-07 05:41:11,643] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 85 [2022-02-07 05:41:11,702] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 68 [2022-02-07 05:41:11,826] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 16 [2022-02-07 05:41:11,847] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 67 [2022-02-07 05:41:11,865] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 105 [2022-02-07 05:41:11,906] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 8 [2022-02-07 05:41:12,077] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 109 [2022-02-07 05:41:12,145] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 124 [2022-02-07 05:41:12,147] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 12 [2022-02-07 05:41:12,196] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 10 [2022-02-07 05:41:12,206] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 31 [2022-02-07 05:41:12,230] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 62 [2022-02-07 05:41:12,237] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 50 [2022-02-07 05:41:12,262] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 107 [2022-02-07 05:41:12,282] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 51 [2022-02-07 05:41:12,288] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 106 [2022-02-07 05:41:12,368] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 54 [2022-02-07 05:41:12,370] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 45 [2022-02-07 05:41:12,428] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 110 [2022-02-07 05:41:12,436] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 75 [2022-02-07 05:41:12,444] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 1 [2022-02-07 05:41:12,451] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 65 [2022-02-07 05:41:12,459] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 80 [2022-02-07 05:41:12,510] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 26 [2022-02-07 05:41:12,524] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 49 [2022-02-07 05:41:12,559] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 116 [2022-02-07 05:41:12,676] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 13 [2022-02-07 05:41:12,719] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 22 [2022-02-07 05:41:12,738] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 83 [2022-02-07 05:41:12,749] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 71 [2022-02-07 05:41:12,750] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 98 [2022-02-07 05:41:12,765] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 25 [2022-02-07 05:41:12,790] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 59 [2022-02-07 05:41:12,816] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 57 [2022-02-07 05:41:12,879] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 86 [2022-02-07 05:41:12,883] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 21 [2022-02-07 05:41:12,917] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 69 [2022-02-07 05:41:12,933] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 82 [2022-02-07 05:41:12,937] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 118 [2022-02-07 05:41:12,986] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 78 [2022-02-07 05:41:13,076] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 9 [2022-02-07 05:41:13,121] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 104 [2022-02-07 05:41:13,179] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 66 [2022-02-07 05:41:13,198] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 27 [2022-02-07 05:41:13,240] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 85 [2022-02-07 05:41:13,316] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 87 [2022-02-07 05:41:13,328] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 81 [2022-02-07 05:41:13,350] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 20 [2022-02-07 05:41:13,361] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 68 [2022-02-07 05:41:13,368] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 113 [2022-02-07 05:41:13,373] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 16 [2022-02-07 05:41:13,373] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 112 [2022-02-07 05:41:13,427] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 77 [2022-02-07 05:41:13,458] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 29 [2022-02-07 05:41:13,480] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 105 [2022-02-07 05:41:13,505] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 67 [2022-02-07 05:41:13,519] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 114 [2022-02-07 05:41:13,560] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 17 [2022-02-07 05:41:13,601] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 14 [2022-02-07 05:41:13,608] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 109 [2022-02-07 05:41:13,676] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 24 [2022-02-07 05:41:13,714] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 31 [2022-02-07 05:41:13,782] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 7 [2022-02-07 05:41:13,812] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 10 [2022-02-07 05:41:13,870] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 75 [2022-02-07 05:41:13,901] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 19 [2022-02-07 05:41:13,934] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 119 [2022-02-07 05:41:13,951] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 110 [2022-02-07 05:41:13,977] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 15 [2022-02-07 05:41:13,981] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 50 [2022-02-07 05:41:13,996] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 74 [2022-02-07 05:41:14,027] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 51 [2022-02-07 05:41:14,035] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 117 [2022-02-07 05:41:14,067] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 12 [2022-02-07 05:41:14,105] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 124 [2022-02-07 05:41:14,143] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 71 [2022-02-07 05:41:14,176] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 23 [2022-02-07 05:41:14,214] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 18 [2022-02-07 05:41:14,268] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 25 [2022-02-07 05:41:14,310] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 73 [2022-02-07 05:41:14,310] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 116 [2022-02-07 05:41:14,319] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 1 [2022-02-07 05:41:14,326] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 125 [2022-02-07 05:41:14,327] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 98 [2022-02-07 05:41:14,329] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 79 [2022-02-07 05:41:14,350] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 54 [2022-02-07 05:41:14,428] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 22 [2022-02-07 05:41:14,428] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 69 [2022-02-07 05:41:14,558] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 78 [2022-02-07 05:41:14,594] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 122 [2022-02-07 05:41:14,601] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 21 [2022-02-07 05:41:14,609] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 80 [2022-02-07 05:41:14,648] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 126 [2022-02-07 05:41:14,655] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 115 [2022-02-07 05:41:14,661] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 13 [2022-02-07 05:41:14,724] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 118 [2022-02-07 05:41:14,764] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 82 [2022-02-07 05:41:14,818] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 9 [2022-02-07 05:41:14,877] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 3 [2022-02-07 05:41:14,898] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 127 [2022-02-07 05:41:14,902] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 2 [2022-02-07 05:41:14,908] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 123 [2022-02-07 05:41:14,913] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 77 [2022-02-07 05:41:14,940] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 111 [2022-02-07 05:41:14,949] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 81 [2022-02-07 05:41:15,056] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 83 [2022-02-07 05:41:15,159] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 24 [2022-02-07 05:41:15,218] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 0 [2022-02-07 05:41:15,247] [INFO] [engine.py:2688:_get_all_zero_checkpoints] successfully read 1 ZeRO state_dicts for rank 6 [2022-02-07 05:41:15,343] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 14 [2022-02-07 05:41:15,345] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 112 [2022-02-07 05:41:15,365] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 113 [2022-02-07 05:41:15,379] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 114 [2022-02-07 05:41:15,399] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 87 [2022-02-07 05:41:15,435] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 27 [2022-02-07 05:41:15,558] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 20 [2022-02-07 05:41:15,582] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 7 [2022-02-07 05:41:15,614] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 86 [2022-02-07 05:41:15,702] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 119 [2022-02-07 05:41:15,707] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 15 [2022-02-07 05:41:15,713] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 117 [2022-02-07 05:41:15,776] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 29 [2022-02-07 05:41:15,778] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 73 [2022-02-07 05:41:15,894] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 74 [2022-02-07 05:41:15,928] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 18 [2022-02-07 05:41:15,962] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 23 [2022-02-07 05:41:16,101] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 79 [2022-02-07 05:41:16,177] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 125 [2022-02-07 05:41:16,196] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 115 [2022-02-07 05:41:16,323] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 111 [2022-02-07 05:41:16,575] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 123 [2022-02-07 05:41:16,616] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 122 [2022-02-07 05:41:16,735] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 17 [2022-02-07 05:41:16,842] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 19 [2022-02-07 05:41:16,880] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 127 [2022-02-07 05:41:16,980] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 2 [2022-02-07 05:41:17,088] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 126 [2022-02-07 05:41:17,157] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 checkpoint version 3.0 [2022-02-07 05:41:17,226] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 6 [2022-02-07 05:41:17,287] [INFO] [engine.py:2618:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 3 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 at iteration 17250 time (ms) | load-checkpoint: 36157.49 /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed/megatron/utils.py:279: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 125.22432 estimated model parameters without embeddings: 103.368064 [after model, optimizer, and learning rate scheduler are built] datetime: 2022-02-07 05:41:17 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 600000000 validation: 20008960 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.061047 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.084 seconds total number of samples: 657686117 total number of epochs: 5 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_20008960ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.154 seconds total number of samples: 20781483 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.083 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2022-02-07 05:41:24 done with setup ... training ... Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: time (ms) | model-and-optimizer-setup: 47652.69 | train/valid/test-data-iterators-setup: 6760.03 [003-001] 103.3651B / 103.3651B [001-001] 103.3651B / 103.3651B [002-001] 103.3651B / 103.3651B [002-000] 125.2243B / 103.3681B[001-000] 125.2243B / 103.3681B [003-030] 103.3651B / 103.3651B [003-016] 103.3651B / 103.3651B [001-016] 103.3651B / 103.3651B [002-030] 103.3651B / 103.3651B [003-003] 103.3651B / 103.3651B [003-009] 103.3651B / 103.3651B [001-009] 103.3651B / 103.3651B[002-008] 103.3651B / 103.3651B[002-009] 103.3651B / 103.3651B [002-011] 103.3651B / 103.3651B[001-010] 103.3651B / 103.3651B [003-004] 103.3651B / 103.3651B [003-005] 103.3651B / 103.3651B [001-028] 103.3651B / 103.3651B[003-028] 103.3651B / 103.3651B [003-029] 103.3651B / 103.3651B [002-018] 103.3651B / 103.3651B[003-018] 103.3651B / 103.3651B [003-017] 103.3651B / 103.3651B [002-016] 103.3651B / 103.3651B [002-017] 103.3651B / 103.3651B[001-017] 103.3651B / 103.3651B [001-021] 103.3651B / 103.3651B [002-020] 103.3651B / 103.3651B[003-021] 103.3651B / 103.3651B [001-006] 103.3651B / 103.3651B[001-007] 103.3651B / 103.3651B [003-006] 103.3651B / 103.3651B [003-015] 103.3651B / 103.3651B[001-015] 103.3651B / 103.3651B[001-014] 103.3651B / 103.3651B [003-000] 125.2243B / 103.3681B [001-005] 103.3651B / 103.3651B[001-004] 103.3651B / 103.3651B [003-024] 103.3651B / 103.3651B[001-024] 103.3651B / 103.3651B [002-028] 103.3651B / 103.3651B [001-026] 103.3651B / 103.3651B[003-026] 103.3651B / 103.3651B [002-026] 103.3651B / 103.3651B [001-023] 103.3651B / 103.3651B [001-022] 103.3651B / 103.3651B[002-023] 103.3651B / 103.3651B[002-022] 103.3651B / 103.3651B [002-031] 125.2273B / 103.3710B [002-013] 103.3651B / 103.3651B[003-012] 103.3651B / 103.3651B [002-012] 103.3651B / 103.3651B [002-019] 103.3651B / 103.3651B [002-002] 103.3651B / 103.3651B [003-002] 103.3651B / 103.3651B [001-003] 103.3651B / 103.3651B [003-010] 103.3651B / 103.3651B [003-011] 103.3651B / 103.3651B [002-004] 103.3651B / 103.3651B[002-005] 103.3651B / 103.3651B [002-025] 103.3651B / 103.3651B [001-025] 103.3651B / 103.3651B[002-024] 103.3651B / 103.3651B [002-029] 103.3651B / 103.3651B [001-031] 125.2273B / 103.3710B [003-013] 103.3651B / 103.3651B [001-019] 103.3651B / 103.3651B [001-018] 103.3651B / 103.3651B[003-019] 103.3651B / 103.3651B [000-016] 103.3651B / 103.3651B [002-006] 103.3651B / 103.3651B [001-002] 103.3651B / 103.3651B [002-003] 103.3651B / 103.3651B [001-008] 103.3651B / 103.3651B [003-008] 103.3651B / 103.3651B [002-010] 103.3651B / 103.3651B [001-011] 103.3651B / 103.3651B [003-025] 103.3651B / 103.3651B [001-029] 103.3651B / 103.3651B [001-030] 103.3651B / 103.3651B [001-013] 103.3651B / 103.3651B [001-012] 103.3651B / 103.3651B [001-020] 103.3651B / 103.3651B [002-021] 103.3651B / 103.3651B [003-007] 103.3651B / 103.3651B [002-007] 103.3651B / 103.3651B [002-015] 103.3651B / 103.3651B[002-014] 103.3651B / 103.3651B [000-005] 103.3651B / 103.3651B[000-004] 103.3651B / 103.3651B [000-028] 103.3651B / 103.3651B [003-027] 103.3651B / 103.3651B [002-027] 103.3651B / 103.3651B [001-027] 103.3651B / 103.3651B [003-023] 103.3651B / 103.3651B [003-022] 103.3651B / 103.3651B [003-031] 125.2273B / 103.3710B [000-012] 103.3651B / 103.3651B [003-020] 103.3651B / 103.3651B [000-007] 103.3651B / 103.3651B [003-014] 103.3651B / 103.3651B [000-001] 103.3651B / 103.3651B [000-029] 103.3651B / 103.3651B [000-023] 103.3651B / 103.3651B [000-013] 103.3651B / 103.3651B [000-017] 103.3651B / 103.3651B [000-021] 103.3651B / 103.3651B [000-014] 103.3651B / 103.3651B [000-025] 103.3651B / 103.3651B [000-020] 103.3651B / 103.3651B [000-015] 103.3651B / 103.3651B [000-000] 125.2243B / 103.3681B [000-008] 103.3651B / 103.3651B [000-002] 103.3651B / 103.3651B [000-026] 103.3651B / 103.3651B [000-006] 103.3651B / 103.3651B [000-024] 103.3651B / 103.3651B [000-003] 103.3651B / 103.3651B [000-030] 103.3651B / 103.3651B [000-018] 103.3651B / 103.3651B [000-009] 103.3651B / 103.3651B [000-027] 103.3651B / 103.3651B [000-010] 103.3651B / 103.3651B [000-022] 103.3651B / 103.3651B [000-019] 103.3651B / 103.3651B [000-011] 103.3651B / 103.3651B [000-031] 125.2273B / 103.3710B [before the start of training step] datetime: 2022-02-07 05:41:24 [2022-02-07 05:41:24,864] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2022-02-07 05:41:24,864] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2022-02-07 05:41:24,864] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers [2022-02-07 05:41:24,864] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2022-02-07 05:41:24,864] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 24] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 20] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 16] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 0] (after 17251 iterations) memory (MB) | allocated: 13208.908203125 | max allocated: 20672.5244140625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 4] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 28] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 12] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 124] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 8] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 32] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 40] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 44] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 52] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 48] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 64] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 56] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 36] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 60] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 68] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 72] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 76] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 84] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 96] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 88] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 80] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 100] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 92] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 104] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 120] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 108] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 112] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 116] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 126] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20726.50439453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 122] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 14] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 10] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 2] (after 17251 iterations) memory (MB) | allocated: 13207.3203125 | max allocated: 20670.9365234375 | reserved: 24404.0 | max reserved: 24404.0 [Rank 30] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 22] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 34] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 18] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 6] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 46] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 50] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 38] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 54] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 26] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 42] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 58] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 62] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 66] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 74] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 82] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 78] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 86] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 90] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 70] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 94] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 102] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 98] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 19] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 7] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 11] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 3] (after 17251 iterations) memory (MB) | allocated: 13208.896484375 | max allocated: 20672.5126953125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 39] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 123] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 31] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 15] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 27] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 23] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 55] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 35] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 43] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 63] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 47] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 51] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 79] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 59] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 71] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 67] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 75] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 83] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 87] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 95] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 91] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 99] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 103] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 118] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 114] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 110] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 106] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 5] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 1] (after 17251 iterations) memory (MB) | allocated: 13208.908203125 | max allocated: 20672.5244140625 | reserved: 24404.0 | max reserved: 24404.0 [Rank 9] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 21] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 33] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 125] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 25] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 13] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 17] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 37] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16958.53662109375 | reserved: 20072.0 | max reserved: 20072.0 [Rank 29] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 iteration 17251/ 292968 | consumed samples: 35330048 | consumed tokens: 18915475456 | elapsed time per iteration (ms): 241618.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.881958E+00 | loss scale: 32768.0 | grad norm: 36423.291 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.008 | TFLOPs: 55.20 | [Rank 127] (after 17251 iterations) memory (MB) | allocated: 13262.15576171875 | max allocated: 20725.81689453125 | reserved: 24404.0 | max reserved: 24404.0 [Rank 107] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 115] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 119] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 111] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 53] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 45] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 41] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 49] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 65] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 77] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 69] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 61] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 57] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 85] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 81] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 101] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 73] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 93] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 109] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 105] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.7763671875 | reserved: 20072.0 | max reserved: 20072.0 [Rank 97] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 117] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 89] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 113] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 [Rank 121] (after 17251 iterations) memory (MB) | allocated: 10797.55712890625 | max allocated: 16957.73876953125 | reserved: 20072.0 | max reserved: 20072.0 iteration 17252/ 292968 | consumed samples: 35332096 | consumed tokens: 18917539840 | elapsed time per iteration (ms): 197482.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.884583E+00 | loss scale: 32768.0 | grad norm: 33083.424 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 67.54 | iteration 17253/ 292968 | consumed samples: 35334144 | consumed tokens: 18919604224 | elapsed time per iteration (ms): 185642.5 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.873677E+00 | loss scale: 32768.0 | grad norm: 107982.479 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.011 | TFLOPs: 71.84 | iteration 17254/ 292968 | consumed samples: 35336192 | consumed tokens: 18921668608 | elapsed time per iteration (ms): 173802.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.931194E+00 | loss scale: 32768.0 | grad norm: 78127.386 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 76.74 | iteration 17255/ 292968 | consumed samples: 35338240 | consumed tokens: 18923732992 | elapsed time per iteration (ms): 168546.0 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.929385E+00 | loss scale: 32768.0 | grad norm: 61444.343 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 79.13 | iteration 17256/ 292968 | consumed samples: 35340288 | consumed tokens: 18925797376 | elapsed time per iteration (ms): 164538.1 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.933079E+00 | loss scale: 32768.0 | grad norm: 68630.203 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.06 | iteration 17257/ 292968 | consumed samples: 35342336 | consumed tokens: 18927861760 | elapsed time per iteration (ms): 162740.2 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.938017E+00 | loss scale: 32768.0 | grad norm: 33167.748 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.95 | iteration 17258/ 292968 | consumed samples: 35344384 | consumed tokens: 18929926144 | elapsed time per iteration (ms): 162232.5 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.906032E+00 | loss scale: 32768.0 | grad norm: 51000.192 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.21 | iteration 17259/ 292968 | consumed samples: 35346432 | consumed tokens: 18931990528 | elapsed time per iteration (ms): 161003.1 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.915025E+00 | loss scale: 32768.0 | grad norm: 44542.384 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.84 | iteration 17260/ 292968 | consumed samples: 35348480 | consumed tokens: 18934054912 | elapsed time per iteration (ms): 162604.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.942188E+00 | loss scale: 32768.0 | grad norm: 42264.080 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.02 | iteration 17261/ 292968 | consumed samples: 35350528 | consumed tokens: 18936119296 | elapsed time per iteration (ms): 161815.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.940590E+00 | loss scale: 32768.0 | grad norm: 46759.260 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.42 | iteration 17262/ 292968 | consumed samples: 35352576 | consumed tokens: 18938183680 | elapsed time per iteration (ms): 162741.4 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.940148E+00 | loss scale: 32768.0 | grad norm: 98734.496 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.95 | iteration 17263/ 292968 | consumed samples: 35354624 | consumed tokens: 18940248064 | elapsed time per iteration (ms): 161757.3 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.948946E+00 | loss scale: 32768.0 | grad norm: 43192.185 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.45 | iteration 17264/ 292968 | consumed samples: 35356672 | consumed tokens: 18942312448 | elapsed time per iteration (ms): 161671.1 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.964389E+00 | loss scale: 32768.0 | grad norm: 46955.204 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.50 | iteration 17265/ 292968 | consumed samples: 35358720 | consumed tokens: 18944376832 | elapsed time per iteration (ms): 162774.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.943272E+00 | loss scale: 32768.0 | grad norm: 57972.562 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.94 | iteration 17266/ 292968 | consumed samples: 35360768 | consumed tokens: 18946441216 | elapsed time per iteration (ms): 161949.9 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.979730E+00 | loss scale: 32768.0 | grad norm: 40438.377 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.35 | iteration 17267/ 292968 | consumed samples: 35362816 | consumed tokens: 18948505600 | elapsed time per iteration (ms): 161577.8 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.954692E+00 | loss scale: 32768.0 | grad norm: 77402.659 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.54 | iteration 17268/ 292968 | consumed samples: 35364864 | consumed tokens: 18950569984 | elapsed time per iteration (ms): 162711.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.925129E+00 | loss scale: 32768.0 | grad norm: 53653.079 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.97 | iteration 17269/ 292968 | consumed samples: 35366912 | consumed tokens: 18952634368 | elapsed time per iteration (ms): 162856.7 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.947999E+00 | loss scale: 32768.0 | grad norm: 55139.254 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 81.90 | iteration 17270/ 292968 | consumed samples: 35368960 | consumed tokens: 18954698752 | elapsed time per iteration (ms): 166978.6 | learning rate: 5.930E-05 | global batch size: 2048 | lm loss: 2.936876E+00 | loss scale: 32768.0 | grad norm: 38935.407 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 79.87 | iteration 17271/ 292968 | consumed samples: 35371008 | consumed tokens: 18956763136 | elapsed time per iteration (ms): 161562.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.921906E+00 | loss scale: 32768.0 | grad norm: 55633.865 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.55 | iteration 17272/ 292968 | consumed samples: 35373056 | consumed tokens: 18958827520 | elapsed time per iteration (ms): 161344.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.927245E+00 | loss scale: 32768.0 | grad norm: 34244.661 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.66 | iteration 17273/ 292968 | consumed samples: 35375104 | consumed tokens: 18960891904 | elapsed time per iteration (ms): 161360.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.914408E+00 | loss scale: 32768.0 | grad norm: 77523.057 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.66 | iteration 17274/ 292968 | consumed samples: 35377152 | consumed tokens: 18962956288 | elapsed time per iteration (ms): 161309.5 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.903873E+00 | loss scale: 32768.0 | grad norm: 43576.376 | num zeros: 0.0 | curriculum seqlen: 1008 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.68 | iteration 17275/ 292968 | consumed samples: 35379200 | consumed tokens: 18965037056 | elapsed time per iteration (ms): 163421.3 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.952730E+00 | loss scale: 32768.0 | grad norm: 75778.950 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.26 | iteration 17276/ 292968 | consumed samples: 35381248 | consumed tokens: 18967117824 | elapsed time per iteration (ms): 163058.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.943199E+00 | loss scale: 32768.0 | grad norm: 50981.079 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.44 | iteration 17277/ 292968 | consumed samples: 35383296 | consumed tokens: 18969198592 | elapsed time per iteration (ms): 163300.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.923966E+00 | loss scale: 32768.0 | grad norm: 57623.921 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.32 | iteration 17278/ 292968 | consumed samples: 35385344 | consumed tokens: 18971279360 | elapsed time per iteration (ms): 163487.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.939858E+00 | loss scale: 32768.0 | grad norm: 42734.109 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.23 | iteration 17279/ 292968 | consumed samples: 35387392 | consumed tokens: 18973360128 | elapsed time per iteration (ms): 163356.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.914700E+00 | loss scale: 32768.0 | grad norm: 47481.069 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.29 | iteration 17280/ 292968 | consumed samples: 35389440 | consumed tokens: 18975440896 | elapsed time per iteration (ms): 163444.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.916293E+00 | loss scale: 32768.0 | grad norm: 48996.379 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.25 | iteration 17281/ 292968 | consumed samples: 35391488 | consumed tokens: 18977521664 | elapsed time per iteration (ms): 163633.6 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.917744E+00 | loss scale: 32768.0 | grad norm: 39680.357 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.15 | iteration 17282/ 292968 | consumed samples: 35393536 | consumed tokens: 18979602432 | elapsed time per iteration (ms): 163515.2 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.926173E+00 | loss scale: 32768.0 | grad norm: 43036.749 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.21 | iteration 17283/ 292968 | consumed samples: 35395584 | consumed tokens: 18981683200 | elapsed time per iteration (ms): 162973.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.914520E+00 | loss scale: 32768.0 | grad norm: 47964.043 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.49 | iteration 17284/ 292968 | consumed samples: 35397632 | consumed tokens: 18983763968 | elapsed time per iteration (ms): 162846.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.887606E+00 | loss scale: 32768.0 | grad norm: 44776.658 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.55 | iteration 17285/ 292968 | consumed samples: 35399680 | consumed tokens: 18985844736 | elapsed time per iteration (ms): 163675.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.908758E+00 | loss scale: 32768.0 | grad norm: 30489.779 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.13 | iteration 17286/ 292968 | consumed samples: 35401728 | consumed tokens: 18987925504 | elapsed time per iteration (ms): 162850.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.960663E+00 | loss scale: 32768.0 | grad norm: 64641.117 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.55 | iteration 17287/ 292968 | consumed samples: 35403776 | consumed tokens: 18990006272 | elapsed time per iteration (ms): 162910.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.925203E+00 | loss scale: 32768.0 | grad norm: 56328.510 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.52 | iteration 17288/ 292968 | consumed samples: 35405824 | consumed tokens: 18992087040 | elapsed time per iteration (ms): 163235.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.939239E+00 | loss scale: 32768.0 | grad norm: 42143.334 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.35 | iteration 17289/ 292968 | consumed samples: 35407872 | consumed tokens: 18994167808 | elapsed time per iteration (ms): 164350.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.907236E+00 | loss scale: 32768.0 | grad norm: 72760.200 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.80 | iteration 17290/ 292968 | consumed samples: 35409920 | consumed tokens: 18996248576 | elapsed time per iteration (ms): 163273.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.963417E+00 | loss scale: 32768.0 | grad norm: 45450.266 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.34 | iteration 17291/ 292968 | consumed samples: 35411968 | consumed tokens: 18998329344 | elapsed time per iteration (ms): 163279.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.937544E+00 | loss scale: 32768.0 | grad norm: 70200.251 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.33 | iteration 17292/ 292968 | consumed samples: 35414016 | consumed tokens: 19000410112 | elapsed time per iteration (ms): 162702.3 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.938456E+00 | loss scale: 32768.0 | grad norm: 66002.005 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.62 | iteration 17293/ 292968 | consumed samples: 35416064 | consumed tokens: 19002490880 | elapsed time per iteration (ms): 163339.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.932324E+00 | loss scale: 32768.0 | grad norm: 50822.970 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.30 | iteration 17294/ 292968 | consumed samples: 35418112 | consumed tokens: 19004571648 | elapsed time per iteration (ms): 164007.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.945367E+00 | loss scale: 32768.0 | grad norm: 36976.460 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.97 | iteration 17295/ 292968 | consumed samples: 35420160 | consumed tokens: 19006652416 | elapsed time per iteration (ms): 163226.6 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.899406E+00 | loss scale: 32768.0 | grad norm: 48040.146 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.36 | iteration 17296/ 292968 | consumed samples: 35422208 | consumed tokens: 19008733184 | elapsed time per iteration (ms): 164330.3 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.925587E+00 | loss scale: 32768.0 | grad norm: 43306.938 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.81 | iteration 17297/ 292968 | consumed samples: 35424256 | consumed tokens: 19010813952 | elapsed time per iteration (ms): 162905.5 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.913133E+00 | loss scale: 32768.0 | grad norm: 53440.343 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.52 | iteration 17298/ 292968 | consumed samples: 35426304 | consumed tokens: 19012894720 | elapsed time per iteration (ms): 162840.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.927796E+00 | loss scale: 32768.0 | grad norm: 43157.712 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.55 | iteration 17299/ 292968 | consumed samples: 35428352 | consumed tokens: 19014975488 | elapsed time per iteration (ms): 163679.1 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.911553E+00 | loss scale: 32768.0 | grad norm: 47810.858 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.13 | iteration 17300/ 292968 | consumed samples: 35430400 | consumed tokens: 19017056256 | elapsed time per iteration (ms): 163562.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.945767E+00 | loss scale: 32768.0 | grad norm: 42366.813 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.19 | saving checkpoint at iteration 17300 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-07 07:59:50,322] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/mp_rank_00_model_states.pt [2022-02-07 07:59:50,335] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/mp_rank_01_model_states.pt [2022-02-07 08:00:03,340] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-07 08:00:03,481] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-07 08:00:04,361] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-07 08:00:05,299] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-07 08:00:06,124] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-07 08:00:06,254] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-07 08:00:06,588] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-07 08:00:06,602] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-07 08:00:07,242] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-07 08:00:09,091] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-07 08:00:12,200] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-07 08:00:12,575] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-07 08:00:12,619] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-07 08:00:12,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-07 08:00:12,765] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-07 08:00:13,021] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-07 08:00:13,464] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-07 08:00:13,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-07 08:00:13,844] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-07 08:00:14,432] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-07 08:00:14,620] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-07 08:00:15,250] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-07 08:00:15,322] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-07 08:00:15,456] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-07 08:00:15,562] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-07 08:00:15,719] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-07 08:00:15,784] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-07 08:00:15,976] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-07 08:00:16,081] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-07 08:00:16,124] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-07 08:00:16,598] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-07 08:00:16,649] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-07 08:00:16,699] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-07 08:00:16,747] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-07 08:00:16,768] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-07 08:00:16,781] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-07 08:00:16,780] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-07 08:00:16,839] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-07 08:00:16,952] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-07 08:00:17,132] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-07 08:00:17,526] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-07 08:00:17,501] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-07 08:00:17,965] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-07 08:00:18,036] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-07 08:00:18,040] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-07 08:00:18,159] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-07 08:00:18,167] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-07 08:00:18,217] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-07 08:00:18,207] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-07 08:00:18,276] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-07 08:00:18,315] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-07 08:00:18,387] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-07 08:00:18,508] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-07 08:00:18,510] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-07 08:00:18,584] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-07 08:00:18,656] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-07 08:00:18,711] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-07 08:00:18,774] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-07 08:00:18,772] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-07 08:00:18,972] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-07 08:00:19,217] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-07 08:00:19,235] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-07 08:00:19,287] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-07 08:00:19,470] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-07 08:00:19,561] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-07 08:00:19,634] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-07 08:00:19,630] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-07 08:00:19,668] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-07 08:00:19,884] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-07 08:00:19,984] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-07 08:00:20,077] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-07 08:00:20,174] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-07 08:00:20,233] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-07 08:00:20,318] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-07 08:00:20,334] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-07 08:00:20,364] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-07 08:00:20,392] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-07 08:00:20,439] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-07 08:00:20,421] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-07 08:00:20,465] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-07 08:00:20,461] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-07 08:00:20,467] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-07 08:00:20,518] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-07 08:00:20,518] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-07 08:00:20,547] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-07 08:00:20,581] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-07 08:00:20,879] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-07 08:00:20,890] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-07 08:00:20,918] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-07 08:00:20,955] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-07 08:00:20,964] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-07 08:00:21,132] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-07 08:00:21,316] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-07 08:00:21,513] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-07 08:00:21,602] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-07 08:00:21,659] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-07 08:00:21,679] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-07 08:00:21,684] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-07 08:00:21,710] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-07 08:00:21,732] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-07 08:00:21,894] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-07 08:00:21,934] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-07 08:00:21,971] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-07 08:00:22,062] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-07 08:00:22,093] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-07 08:00:22,122] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-07 08:00:22,131] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-07 08:00:22,169] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-07 08:00:22,191] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-07 08:00:22,227] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-07 08:00:22,360] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-07 08:00:22,370] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-07 08:00:22,469] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-07 08:00:23,550] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-07 08:00:24,262] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-07 08:00:24,681] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-07 08:00:27,329] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-07 08:00:27,546] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-07 08:00:27,677] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-07 08:00:29,129] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-07 08:00:29,206] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-07 08:00:29,289] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-07 08:00:29,335] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-07 08:00:29,457] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-07 08:00:30,535] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-07 08:00:30,562] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-07 08:00:30,860] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-07 08:00:31,054] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17300/zero_pp_rank_0_mp_rank_125_optim_states.pt successfully saved checkpoint at iteration 17300 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 46410.19 iteration 17301/ 292968 | consumed samples: 35432448 | consumed tokens: 19019137024 | elapsed time per iteration (ms): 217600.2 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.878478E+00 | loss scale: 32768.0 | grad norm: 43432.629 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.009 | TFLOPs: 61.78 | iteration 17302/ 292968 | consumed samples: 35434496 | consumed tokens: 19021217792 | elapsed time per iteration (ms): 163677.5 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.884058E+00 | loss scale: 32768.0 | grad norm: 38822.732 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.13 | iteration 17303/ 292968 | consumed samples: 35436544 | consumed tokens: 19023298560 | elapsed time per iteration (ms): 163673.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.904985E+00 | loss scale: 32768.0 | grad norm: 38772.413 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.13 | iteration 17304/ 292968 | consumed samples: 35438592 | consumed tokens: 19025379328 | elapsed time per iteration (ms): 162858.9 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.901158E+00 | loss scale: 32768.0 | grad norm: 51568.717 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.55 | iteration 17305/ 292968 | consumed samples: 35440640 | consumed tokens: 19027460096 | elapsed time per iteration (ms): 162706.3 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.888022E+00 | loss scale: 32768.0 | grad norm: 30708.976 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.62 | iteration 17306/ 292968 | consumed samples: 35442688 | consumed tokens: 19029540864 | elapsed time per iteration (ms): 163595.1 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.897484E+00 | loss scale: 32768.0 | grad norm: 54969.793 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.17 | iteration 17307/ 292968 | consumed samples: 35444736 | consumed tokens: 19031621632 | elapsed time per iteration (ms): 162712.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.859846E+00 | loss scale: 32768.0 | grad norm: 36721.464 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.62 | iteration 17308/ 292968 | consumed samples: 35446784 | consumed tokens: 19033702400 | elapsed time per iteration (ms): 163026.2 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.883641E+00 | loss scale: 32768.0 | grad norm: 46151.144 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.46 | iteration 17309/ 292968 | consumed samples: 35448832 | consumed tokens: 19035783168 | elapsed time per iteration (ms): 163051.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.880014E+00 | loss scale: 32768.0 | grad norm: 44060.007 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.45 | iteration 17310/ 292968 | consumed samples: 35450880 | consumed tokens: 19037863936 | elapsed time per iteration (ms): 163378.9 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.905475E+00 | loss scale: 32768.0 | grad norm: 40145.072 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.28 | iteration 17311/ 292968 | consumed samples: 35452928 | consumed tokens: 19039944704 | elapsed time per iteration (ms): 163394.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.904605E+00 | loss scale: 32768.0 | grad norm: 42498.171 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.27 | iteration 17312/ 292968 | consumed samples: 35454976 | consumed tokens: 19042025472 | elapsed time per iteration (ms): 163921.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.881183E+00 | loss scale: 32768.0 | grad norm: 52234.206 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 82.01 | iteration 17313/ 292968 | consumed samples: 35457024 | consumed tokens: 19044106240 | elapsed time per iteration (ms): 163428.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.888123E+00 | loss scale: 32768.0 | grad norm: 48010.686 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.26 | iteration 17314/ 292968 | consumed samples: 35459072 | consumed tokens: 19046187008 | elapsed time per iteration (ms): 163662.6 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.929553E+00 | loss scale: 32768.0 | grad norm: 45705.396 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.14 | iteration 17315/ 292968 | consumed samples: 35461120 | consumed tokens: 19048267776 | elapsed time per iteration (ms): 163467.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.899248E+00 | loss scale: 32768.0 | grad norm: 52777.076 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.24 | iteration 17316/ 292968 | consumed samples: 35463168 | consumed tokens: 19050348544 | elapsed time per iteration (ms): 163808.2 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.892824E+00 | loss scale: 32768.0 | grad norm: 33487.451 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.07 | iteration 17317/ 292968 | consumed samples: 35465216 | consumed tokens: 19052429312 | elapsed time per iteration (ms): 163613.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.928116E+00 | loss scale: 32768.0 | grad norm: 59861.292 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.16 | iteration 17318/ 292968 | consumed samples: 35467264 | consumed tokens: 19054510080 | elapsed time per iteration (ms): 163531.6 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.923692E+00 | loss scale: 32768.0 | grad norm: 29024.752 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.21 | iteration 17319/ 292968 | consumed samples: 35469312 | consumed tokens: 19056590848 | elapsed time per iteration (ms): 163552.5 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.930310E+00 | loss scale: 32768.0 | grad norm: 54204.052 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.20 | iteration 17320/ 292968 | consumed samples: 35471360 | consumed tokens: 19058671616 | elapsed time per iteration (ms): 162954.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.913774E+00 | loss scale: 32768.0 | grad norm: 37271.945 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.50 | iteration 17321/ 292968 | consumed samples: 35473408 | consumed tokens: 19060752384 | elapsed time per iteration (ms): 164024.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.918704E+00 | loss scale: 32768.0 | grad norm: 50401.428 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.96 | iteration 17322/ 292968 | consumed samples: 35475456 | consumed tokens: 19062833152 | elapsed time per iteration (ms): 164029.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.908293E+00 | loss scale: 32768.0 | grad norm: 39512.930 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.96 | iteration 17323/ 292968 | consumed samples: 35477504 | consumed tokens: 19064913920 | elapsed time per iteration (ms): 164969.3 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.880911E+00 | loss scale: 32768.0 | grad norm: 37624.897 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.49 | iteration 17324/ 292968 | consumed samples: 35479552 | consumed tokens: 19066994688 | elapsed time per iteration (ms): 164093.9 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.885984E+00 | loss scale: 32768.0 | grad norm: 51917.903 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.92 | iteration 17325/ 292968 | consumed samples: 35481600 | consumed tokens: 19069075456 | elapsed time per iteration (ms): 164036.5 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.913517E+00 | loss scale: 32768.0 | grad norm: 29463.111 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.95 | iteration 17326/ 292968 | consumed samples: 35483648 | consumed tokens: 19071156224 | elapsed time per iteration (ms): 163532.1 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.915769E+00 | loss scale: 32768.0 | grad norm: 48097.682 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.21 | iteration 17327/ 292968 | consumed samples: 35485696 | consumed tokens: 19073236992 | elapsed time per iteration (ms): 163495.8 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.864495E+00 | loss scale: 32768.0 | grad norm: 46788.734 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.22 | iteration 17328/ 292968 | consumed samples: 35487744 | consumed tokens: 19075317760 | elapsed time per iteration (ms): 163460.7 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.944571E+00 | loss scale: 32768.0 | grad norm: 46374.396 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.24 | iteration 17329/ 292968 | consumed samples: 35489792 | consumed tokens: 19077398528 | elapsed time per iteration (ms): 163878.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.913780E+00 | loss scale: 32768.0 | grad norm: 48075.136 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 82.03 | iteration 17330/ 292968 | consumed samples: 35491840 | consumed tokens: 19079479296 | elapsed time per iteration (ms): 163611.6 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.943731E+00 | loss scale: 32768.0 | grad norm: 40092.133 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.17 | iteration 17331/ 292968 | consumed samples: 35493888 | consumed tokens: 19081560064 | elapsed time per iteration (ms): 163115.0 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.926036E+00 | loss scale: 32768.0 | grad norm: 50567.099 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.42 | iteration 17332/ 292968 | consumed samples: 35495936 | consumed tokens: 19083640832 | elapsed time per iteration (ms): 164494.4 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.937063E+00 | loss scale: 32768.0 | grad norm: 37804.529 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.72 | iteration 17333/ 292968 | consumed samples: 35497984 | consumed tokens: 19085721600 | elapsed time per iteration (ms): 163654.6 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.933242E+00 | loss scale: 32768.0 | grad norm: 41947.655 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.14 | iteration 17334/ 292968 | consumed samples: 35500032 | consumed tokens: 19087802368 | elapsed time per iteration (ms): 164699.6 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.918557E+00 | loss scale: 32768.0 | grad norm: 31673.758 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.62 | iteration 17335/ 292968 | consumed samples: 35502080 | consumed tokens: 19089883136 | elapsed time per iteration (ms): 164094.5 | learning rate: 5.929E-05 | global batch size: 2048 | lm loss: 2.911911E+00 | loss scale: 32768.0 | grad norm: 46323.660 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.92 | iteration 17336/ 292968 | consumed samples: 35504128 | consumed tokens: 19091963904 | elapsed time per iteration (ms): 167557.4 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.945362E+00 | loss scale: 32768.0 | grad norm: 50536.653 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.23 | iteration 17337/ 292968 | consumed samples: 35506176 | consumed tokens: 19094044672 | elapsed time per iteration (ms): 166808.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.901902E+00 | loss scale: 32768.0 | grad norm: 46713.044 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.59 | iteration 17338/ 292968 | consumed samples: 35508224 | consumed tokens: 19096125440 | elapsed time per iteration (ms): 165573.4 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.918969E+00 | loss scale: 32768.0 | grad norm: 43502.396 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.19 | iteration 17339/ 292968 | consumed samples: 35510272 | consumed tokens: 19098206208 | elapsed time per iteration (ms): 166859.7 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.899833E+00 | loss scale: 32768.0 | grad norm: 33183.238 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.57 | iteration 17340/ 292968 | consumed samples: 35512320 | consumed tokens: 19100286976 | elapsed time per iteration (ms): 165610.6 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.923946E+00 | loss scale: 32768.0 | grad norm: 53087.081 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.17 | iteration 17341/ 292968 | consumed samples: 35514368 | consumed tokens: 19102367744 | elapsed time per iteration (ms): 165756.7 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.934494E+00 | loss scale: 32768.0 | grad norm: 39619.690 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.10 | iteration 17342/ 292968 | consumed samples: 35516416 | consumed tokens: 19104448512 | elapsed time per iteration (ms): 165875.0 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.909921E+00 | loss scale: 32768.0 | grad norm: 52509.968 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.04 | iteration 17343/ 292968 | consumed samples: 35518464 | consumed tokens: 19106529280 | elapsed time per iteration (ms): 166217.5 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.952748E+00 | loss scale: 32768.0 | grad norm: 47411.159 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.88 | iteration 17344/ 292968 | consumed samples: 35520512 | consumed tokens: 19108610048 | elapsed time per iteration (ms): 166794.3 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.902324E+00 | loss scale: 32768.0 | grad norm: 34598.880 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.60 | iteration 17345/ 292968 | consumed samples: 35522560 | consumed tokens: 19110690816 | elapsed time per iteration (ms): 167594.3 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.953188E+00 | loss scale: 32768.0 | grad norm: 49425.117 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.21 | iteration 17346/ 292968 | consumed samples: 35524608 | consumed tokens: 19112771584 | elapsed time per iteration (ms): 167013.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.928056E+00 | loss scale: 32768.0 | grad norm: 46086.079 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.49 | iteration 17347/ 292968 | consumed samples: 35526656 | consumed tokens: 19114852352 | elapsed time per iteration (ms): 167388.4 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.887190E+00 | loss scale: 32768.0 | grad norm: 47797.201 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.31 | iteration 17348/ 292968 | consumed samples: 35528704 | consumed tokens: 19116933120 | elapsed time per iteration (ms): 167863.7 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.879212E+00 | loss scale: 32768.0 | grad norm: 42872.343 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.08 | iteration 17349/ 292968 | consumed samples: 35530752 | consumed tokens: 19119013888 | elapsed time per iteration (ms): 167878.2 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.906527E+00 | loss scale: 32768.0 | grad norm: 48661.519 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.08 | iteration 17350/ 292968 | consumed samples: 35532800 | consumed tokens: 19121094656 | elapsed time per iteration (ms): 168772.5 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.902423E+00 | loss scale: 32768.0 | grad norm: 36146.621 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 79.65 | saving checkpoint at iteration 17350 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 [2022-02-07 10:17:53,656] [INFO] [logging.py:69:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/mp_rank_01_model_states.pt [2022-02-07 10:17:53,667] [INFO] [logging.py:69:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/mp_rank_00_model_states.pt [2022-02-07 10:18:07,378] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_109_optim_states.pt [2022-02-07 10:18:07,703] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_105_optim_states.pt [2022-02-07 10:18:08,367] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_108_optim_states.pt [2022-02-07 10:18:08,470] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_106_optim_states.pt [2022-02-07 10:18:09,052] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_104_optim_states.pt [2022-02-07 10:18:09,320] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_111_optim_states.pt [2022-02-07 10:18:09,341] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_107_optim_states.pt [2022-02-07 10:18:09,462] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_110_optim_states.pt [2022-02-07 10:18:15,049] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_75_optim_states.pt [2022-02-07 10:18:15,593] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_74_optim_states.pt [2022-02-07 10:18:16,918] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_96_optim_states.pt [2022-02-07 10:18:17,194] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_97_optim_states.pt [2022-02-07 10:18:17,343] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_103_optim_states.pt [2022-02-07 10:18:17,359] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_102_optim_states.pt [2022-02-07 10:18:17,431] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_101_optim_states.pt [2022-02-07 10:18:17,515] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_100_optim_states.pt [2022-02-07 10:18:18,078] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_16_optim_states.pt [2022-02-07 10:18:18,225] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_79_optim_states.pt [2022-02-07 10:18:18,485] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_98_optim_states.pt [2022-02-07 10:18:18,485] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_99_optim_states.pt [2022-02-07 10:18:18,767] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_24_optim_states.pt [2022-02-07 10:18:18,787] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_78_optim_states.pt [2022-02-07 10:18:18,814] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_14_optim_states.pt [2022-02-07 10:18:18,856] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_113_optim_states.pt [2022-02-07 10:18:18,940] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_15_optim_states.pt [2022-02-07 10:18:18,954] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_30_optim_states.pt [2022-02-07 10:18:19,092] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_31_optim_states.pt [2022-02-07 10:18:19,715] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_25_optim_states.pt [2022-02-07 10:18:19,720] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_04_optim_states.pt [2022-02-07 10:18:19,649] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_76_optim_states.pt [2022-02-07 10:18:19,859] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_05_optim_states.pt [2022-02-07 10:18:19,919] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_122_optim_states.pt [2022-02-07 10:18:19,940] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_123_optim_states.pt [2022-02-07 10:18:20,162] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_17_optim_states.pt [2022-02-07 10:18:20,170] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_112_optim_states.pt [2022-02-07 10:18:20,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_77_optim_states.pt [2022-02-07 10:18:20,664] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_94_optim_states.pt [2022-02-07 10:18:21,275] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_07_optim_states.pt [2022-02-07 10:18:21,276] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_58_optim_states.pt [2022-02-07 10:18:21,357] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_72_optim_states.pt [2022-02-07 10:18:21,512] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_121_optim_states.pt [2022-02-07 10:18:21,574] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_21_optim_states.pt [2022-02-07 10:18:21,711] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_27_optim_states.pt [2022-02-07 10:18:21,741] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_73_optim_states.pt [2022-02-07 10:18:22,102] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_26_optim_states.pt [2022-02-07 10:18:22,359] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_70_optim_states.pt [2022-02-07 10:18:22,878] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_71_optim_states.pt [2022-02-07 10:18:23,130] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_45_optim_states.pt [2022-02-07 10:18:23,146] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_84_optim_states.pt [2022-02-07 10:18:23,154] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_47_optim_states.pt [2022-02-07 10:18:23,167] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_59_optim_states.pt [2022-02-07 10:18:23,207] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_44_optim_states.pt [2022-02-07 10:18:23,447] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_52_optim_states.pt [2022-02-07 10:18:23,493] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_86_optim_states.pt [2022-02-07 10:18:23,728] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_85_optim_states.pt [2022-02-07 10:18:23,745] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_51_optim_states.pt [2022-02-07 10:18:23,948] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_06_optim_states.pt [2022-02-07 10:18:23,876] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_20_optim_states.pt [2022-02-07 10:18:24,102] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_65_optim_states.pt [2022-02-07 10:18:24,139] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_82_optim_states.pt [2022-02-07 10:18:24,259] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_28_optim_states.pt [2022-02-07 10:18:24,257] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_83_optim_states.pt [2022-02-07 10:18:24,347] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_64_optim_states.pt [2022-02-07 10:18:24,578] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_81_optim_states.pt [2022-02-07 10:18:24,606] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_50_optim_states.pt [2022-02-07 10:18:24,632] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_63_optim_states.pt [2022-02-07 10:18:24,650] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_29_optim_states.pt [2022-02-07 10:18:24,686] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_95_optim_states.pt [2022-02-07 10:18:24,803] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_53_optim_states.pt [2022-02-07 10:18:24,887] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_62_optim_states.pt [2022-02-07 10:18:24,991] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_87_optim_states.pt [2022-02-07 10:18:25,036] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_80_optim_states.pt [2022-02-07 10:18:25,371] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_68_optim_states.pt [2022-02-07 10:18:25,812] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_69_optim_states.pt [2022-02-07 10:18:25,910] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_10_optim_states.pt [2022-02-07 10:18:25,978] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_93_optim_states.pt [2022-02-07 10:18:25,971] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_22_optim_states.pt [2022-02-07 10:18:26,006] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_57_optim_states.pt [2022-02-07 10:18:26,007] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_23_optim_states.pt [2022-02-07 10:18:26,168] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_67_optim_states.pt [2022-02-07 10:18:26,367] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_120_optim_states.pt [2022-02-07 10:18:26,541] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_66_optim_states.pt [2022-02-07 10:18:26,629] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_61_optim_states.pt [2022-02-07 10:18:26,783] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_92_optim_states.pt [2022-02-07 10:18:27,010] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_38_optim_states.pt [2022-02-07 10:18:27,093] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_88_optim_states.pt [2022-02-07 10:18:27,209] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_56_optim_states.pt [2022-02-07 10:18:27,190] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_89_optim_states.pt [2022-02-07 10:18:27,241] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_42_optim_states.pt [2022-02-07 10:18:27,257] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_19_optim_states.pt [2022-02-07 10:18:27,258] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_35_optim_states.pt [2022-02-07 10:18:27,223] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_60_optim_states.pt [2022-02-07 10:18:27,396] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_39_optim_states.pt [2022-02-07 10:18:27,640] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_18_optim_states.pt [2022-02-07 10:18:28,024] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_08_optim_states.pt [2022-02-07 10:18:28,050] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_11_optim_states.pt [2022-02-07 10:18:28,156] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_09_optim_states.pt [2022-02-07 10:18:28,426] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_91_optim_states.pt [2022-02-07 10:18:28,556] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_90_optim_states.pt [2022-02-07 10:18:28,633] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_116_optim_states.pt [2022-02-07 10:18:28,637] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_117_optim_states.pt [2022-02-07 10:18:28,908] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_43_optim_states.pt [2022-02-07 10:18:28,965] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_118_optim_states.pt [2022-02-07 10:18:29,029] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_46_optim_states.pt [2022-02-07 10:18:29,063] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_03_optim_states.pt [2022-02-07 10:18:29,025] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_119_optim_states.pt [2022-02-07 10:18:29,468] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_13_optim_states.pt [2022-02-07 10:18:29,750] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_12_optim_states.pt [2022-02-07 10:18:29,790] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_49_optim_states.pt [2022-02-07 10:18:29,856] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_48_optim_states.pt [2022-02-07 10:18:29,997] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_115_optim_states.pt [2022-02-07 10:18:30,125] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_114_optim_states.pt [2022-02-07 10:18:30,704] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_02_optim_states.pt [2022-02-07 10:18:30,995] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_55_optim_states.pt [2022-02-07 10:18:31,418] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_54_optim_states.pt [2022-02-07 10:18:32,427] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_34_optim_states.pt [2022-02-07 10:18:32,920] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_41_optim_states.pt [2022-02-07 10:18:32,950] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_36_optim_states.pt [2022-02-07 10:18:33,065] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_40_optim_states.pt [2022-02-07 10:18:33,098] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_37_optim_states.pt [2022-02-07 10:18:33,200] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_127_optim_states.pt [2022-02-07 10:18:33,243] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_126_optim_states.pt [2022-02-07 10:18:34,013] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-02-07 10:18:34,056] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_01_optim_states.pt [2022-02-07 10:18:34,651] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_33_optim_states.pt [2022-02-07 10:18:34,836] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_32_optim_states.pt [2022-02-07 10:18:36,546] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_124_optim_states.pt [2022-02-07 10:18:36,705] [INFO] [engine.py:3023:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100/global_step17350/zero_pp_rank_0_mp_rank_125_optim_states.pt successfully saved checkpoint at iteration 17350 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/cl-a100 time (ms) | save-checkpoint: 47689.05 iteration 17351/ 292968 | consumed samples: 35534848 | consumed tokens: 19123175424 | elapsed time per iteration (ms): 215422.9 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.960695E+00 | loss scale: 32768.0 | grad norm: 66279.114 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.010 | TFLOPs: 62.40 | iteration 17352/ 292968 | consumed samples: 35536896 | consumed tokens: 19125256192 | elapsed time per iteration (ms): 166655.0 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.932477E+00 | loss scale: 32768.0 | grad norm: 44744.022 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.66 | iteration 17353/ 292968 | consumed samples: 35538944 | consumed tokens: 19127336960 | elapsed time per iteration (ms): 167972.5 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.951037E+00 | loss scale: 32768.0 | grad norm: 54020.992 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.03 | iteration 17354/ 292968 | consumed samples: 35540992 | consumed tokens: 19129417728 | elapsed time per iteration (ms): 169052.5 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.941555E+00 | loss scale: 32768.0 | grad norm: 52488.736 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 79.52 | iteration 17355/ 292968 | consumed samples: 35543040 | consumed tokens: 19131498496 | elapsed time per iteration (ms): 167853.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.929975E+00 | loss scale: 32768.0 | grad norm: 42037.957 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.09 | iteration 17356/ 292968 | consumed samples: 35545088 | consumed tokens: 19133579264 | elapsed time per iteration (ms): 167291.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.917343E+00 | loss scale: 32768.0 | grad norm: 55080.612 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.36 | iteration 17357/ 292968 | consumed samples: 35547136 | consumed tokens: 19135660032 | elapsed time per iteration (ms): 166773.1 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.895092E+00 | loss scale: 32768.0 | grad norm: 69540.609 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.61 | iteration 17358/ 292968 | consumed samples: 35549184 | consumed tokens: 19137740800 | elapsed time per iteration (ms): 167321.7 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.941427E+00 | loss scale: 32768.0 | grad norm: 41372.650 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.34 | iteration 17359/ 292968 | consumed samples: 35551232 | consumed tokens: 19139821568 | elapsed time per iteration (ms): 163734.7 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.946260E+00 | loss scale: 32768.0 | grad norm: 80182.252 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.10 | iteration 17360/ 292968 | consumed samples: 35553280 | consumed tokens: 19141902336 | elapsed time per iteration (ms): 163916.9 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.916407E+00 | loss scale: 32768.0 | grad norm: 51450.016 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 82.01 | iteration 17361/ 292968 | consumed samples: 35555328 | consumed tokens: 19143983104 | elapsed time per iteration (ms): 163646.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.930764E+00 | loss scale: 32768.0 | grad norm: 91778.013 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.15 | iteration 17362/ 292968 | consumed samples: 35557376 | consumed tokens: 19146063872 | elapsed time per iteration (ms): 164371.5 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.971426E+00 | loss scale: 32768.0 | grad norm: 54000.863 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.79 | iteration 17363/ 292968 | consumed samples: 35559424 | consumed tokens: 19148144640 | elapsed time per iteration (ms): 163554.1 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.911135E+00 | loss scale: 32768.0 | grad norm: 73761.044 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.19 | iteration 17364/ 292968 | consumed samples: 35561472 | consumed tokens: 19150225408 | elapsed time per iteration (ms): 164328.2 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.965653E+00 | loss scale: 32768.0 | grad norm: 58030.065 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.81 | iteration 17365/ 292968 | consumed samples: 35563520 | consumed tokens: 19152306176 | elapsed time per iteration (ms): 163474.5 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.912571E+00 | loss scale: 32768.0 | grad norm: 42459.644 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.23 | iteration 17366/ 292968 | consumed samples: 35565568 | consumed tokens: 19154386944 | elapsed time per iteration (ms): 164015.9 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.947087E+00 | loss scale: 32768.0 | grad norm: 51496.952 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.96 | iteration 17367/ 292968 | consumed samples: 35567616 | consumed tokens: 19156467712 | elapsed time per iteration (ms): 167390.4 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.901920E+00 | loss scale: 32768.0 | grad norm: 39253.341 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 80.31 | iteration 17368/ 292968 | consumed samples: 35569664 | consumed tokens: 19158548480 | elapsed time per iteration (ms): 163459.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.924090E+00 | loss scale: 32768.0 | grad norm: 75229.075 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.24 | iteration 17369/ 292968 | consumed samples: 35571712 | consumed tokens: 19160629248 | elapsed time per iteration (ms): 164035.6 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.921795E+00 | loss scale: 32768.0 | grad norm: 118906.179 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.95 | iteration 17370/ 292968 | consumed samples: 35573760 | consumed tokens: 19162710016 | elapsed time per iteration (ms): 163923.9 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 3.003695E+00 | loss scale: 32768.0 | grad norm: 49104.981 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 82.01 | iteration 17371/ 292968 | consumed samples: 35575808 | consumed tokens: 19164790784 | elapsed time per iteration (ms): 164552.2 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.946100E+00 | loss scale: 32768.0 | grad norm: 67085.272 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.70 | iteration 17372/ 292968 | consumed samples: 35577856 | consumed tokens: 19166871552 | elapsed time per iteration (ms): 163725.6 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.962209E+00 | loss scale: 32768.0 | grad norm: 55869.704 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.11 | iteration 17373/ 292968 | consumed samples: 35579904 | consumed tokens: 19168952320 | elapsed time per iteration (ms): 168794.9 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.960222E+00 | loss scale: 32768.0 | grad norm: 49808.105 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 79.64 | iteration 17374/ 292968 | consumed samples: 35581952 | consumed tokens: 19171033088 | elapsed time per iteration (ms): 163911.6 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.924549E+00 | loss scale: 32768.0 | grad norm: 51438.670 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 82.01 | iteration 17375/ 292968 | consumed samples: 35584000 | consumed tokens: 19173113856 | elapsed time per iteration (ms): 163612.6 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.935894E+00 | loss scale: 32768.0 | grad norm: 41753.848 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.16 | iteration 17376/ 292968 | consumed samples: 35586048 | consumed tokens: 19175194624 | elapsed time per iteration (ms): 163938.4 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.954255E+00 | loss scale: 32768.0 | grad norm: 60858.549 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 82.00 | iteration 17377/ 292968 | consumed samples: 35588096 | consumed tokens: 19177275392 | elapsed time per iteration (ms): 163676.6 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.974581E+00 | loss scale: 32768.0 | grad norm: 57366.389 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.13 | iteration 17378/ 292968 | consumed samples: 35590144 | consumed tokens: 19179356160 | elapsed time per iteration (ms): 164137.1 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.921101E+00 | loss scale: 32768.0 | grad norm: 57567.701 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.90 | iteration 17379/ 292968 | consumed samples: 35592192 | consumed tokens: 19181436928 | elapsed time per iteration (ms): 163298.7 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.944838E+00 | loss scale: 32768.0 | grad norm: 41110.831 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.32 | iteration 17380/ 292968 | consumed samples: 35594240 | consumed tokens: 19183517696 | elapsed time per iteration (ms): 163425.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.923714E+00 | loss scale: 32768.0 | grad norm: 71509.649 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.26 | iteration 17381/ 292968 | consumed samples: 35596288 | consumed tokens: 19185598464 | elapsed time per iteration (ms): 163373.3 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.950095E+00 | loss scale: 32768.0 | grad norm: 48793.697 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.29 | iteration 17382/ 292968 | consumed samples: 35598336 | consumed tokens: 19187679232 | elapsed time per iteration (ms): 163510.2 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.926193E+00 | loss scale: 32768.0 | grad norm: 79087.600 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.22 | iteration 17383/ 292968 | consumed samples: 35600384 | consumed tokens: 19189760000 | elapsed time per iteration (ms): 163953.3 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.949289E+00 | loss scale: 32768.0 | grad norm: 56224.461 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.99 | iteration 17384/ 292968 | consumed samples: 35602432 | consumed tokens: 19191840768 | elapsed time per iteration (ms): 163714.9 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.959245E+00 | loss scale: 32768.0 | grad norm: 61999.685 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.013 | TFLOPs: 82.11 | iteration 17385/ 292968 | consumed samples: 35604480 | consumed tokens: 19193921536 | elapsed time per iteration (ms): 163975.8 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.922317E+00 | loss scale: 32768.0 | grad norm: 46045.239 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.98 | iteration 17386/ 292968 | consumed samples: 35606528 | consumed tokens: 19196002304 | elapsed time per iteration (ms): 164017.9 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.914865E+00 | loss scale: 32768.0 | grad norm: 47356.528 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.96 | iteration 17387/ 292968 | consumed samples: 35608576 | consumed tokens: 19198083072 | elapsed time per iteration (ms): 163932.5 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.926043E+00 | loss scale: 32768.0 | grad norm: 48906.133 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 82.00 | iteration 17388/ 292968 | consumed samples: 35610624 | consumed tokens: 19200163840 | elapsed time per iteration (ms): 165822.1 | learning rate: 5.928E-05 | global batch size: 2048 | lm loss: 2.923313E+00 | loss scale: 32768.0 | grad norm: 42431.743 | num zeros: 0.0 | curriculum seqlen: 1016 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 0.012 | TFLOPs: 81.07 | WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625320 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625321 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626212 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626213 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625322 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624803 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617267 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625323 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622851 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624804 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624106 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626214 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617268 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622852 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624107 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626215 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617157 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624815 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622853 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624682 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618380 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624108 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621088 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568811 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617158 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623381 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616767 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624683 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624805 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618381 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617269 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624816 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621089 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616768 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623382 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568812 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617270 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624817 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624109 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618382 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617159 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624684 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626216 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624806 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621090 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616769 closing signal SIGTERM WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625324 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568813 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609434 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622854 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624807 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624818 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617271 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617160 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623383 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624110 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609435 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624685 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618383 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626217 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617161 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625325 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626218 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616770 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624686 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621091 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623384 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622855 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568814 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624808 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624819 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624111 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618384 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609436 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625326 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609437 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616771 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622856 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 626219 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617272 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623385 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624820 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624809 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568815 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623386 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621092 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618385 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616772 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617162 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624112 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622857 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617273 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624113 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617163 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623387 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624687 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617274 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624821 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568816 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618386 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624810 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624688 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624822 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 625327 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 622858 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568817 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 623388 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609438 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616773 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 568818 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 618387 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621093 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 616774 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 617164 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 624689 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609439 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609440 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621094 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 621095 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 609441 closing signal SIGTERM slurmstepd: error: *** STEP 1794034.0 ON jean-zay-iam25 CANCELLED AT 2022-02-07T12:05:23 *** WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614211 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614212 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614213 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614214 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614215 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614216 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614217 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 614218 closing signal SIGTERM